R Parity Reference¶

pyprego is a Python port of the prego R package. This page documents the mapping between R and Python functions.

Function Mapping¶

Core Regression¶

R function	Python function	Notes
`regress_pwm()`	`pyprego.regress_pwm()`	Full feature parity
`regress_multiple_motifs()`	`pyprego.regress_multiple_motifs()`	Also accessible via `regress_pwm(motif_num=N)`
`regress_pwm_clusters()`	`pyprego.regress_pwm_clusters()`	Per-cluster regression
`regress_pwm_cv()`	`pyprego.regress_pwm_cv()`	Cross-validated regression

PWM Scoring¶

R function	Python function	Notes
`compute_pwm()`	`pyprego.compute_pwm()`	logSumExp and max aggregation
`compute_local_pwm()`	`pyprego.compute_local_pwm()`	Per-position scores

K-mer Operations¶

R function	Python function	Notes
`screen_kmers()`	`pyprego.screen_kmers()`	Vectorized implementation
`generate_kmers()`	`pyprego.generate_kmers()`	Including gapped k-mers
`kmer_matrix()`	`pyprego.kmer_matrix()`	K-mer count matrix
`kmers_to_pssm()`	`pyprego.kmers_to_pssm()`	K-mer to PSSM conversion

PSSM Utilities¶

R function	Python function	Notes
`pssm_cor()`	`pyprego.pssm_cor()`	Best-alignment correlation
`pssm_diff()`	`pyprego.pssm_diff()`	Best-alignment distance
`pssm_match()`	`pyprego.pssm_match()`	Database matching
`pssm_trim()`	`pyprego.pssm_trim()`	Trim low-info edges
`pssm_rc()`	`pyprego.pssm_rc()`	Reverse complement
`bits_per_pos()`	`pyprego.bits_per_pos()`	Information content
`consensus_from_pssm()`	`pyprego.consensus_from_pssm()`	Consensus with IUPAC codes
`pssm_quantile()`	`pyprego.pssm_quantile()`	Empirical score quantile
`pssm_dataset_cor()`	`pyprego.pssm_dataset_cor()`	Dataset-level correlation
`pssm_dataset_diff()`	`pyprego.pssm_dataset_diff()`	Dataset-level distance

Motif Database¶

R function	Python function	Notes
`create_motif_db()`	`pyprego.create_motif_db()`	Build MotifDB from sequences
`extract_pwm()`	`pyprego.extract_pwm()`	Extract PSSM from MotifDB
`screen_pwm()`	`pyprego.screen_pwm()`	Score sequences against all motifs in a DB
`motif_enrichment()`	`pyprego.motif_enrichment()`	Motif enrichment analysis
`all_motif_datasets()`	`pyprego.all_motif_datasets()`	List bundled datasets
`get_motif_pssm()`	`pyprego.get_motif_pssm()`	Retrieve PSSM by name from bundled dataset

Visualization¶

R function	Python function	Notes
`plot_pssm_logo()`	`pyprego.plot_pssm_logo()`	Uses logomaker or bar-chart fallback
(no direct R equivalent)	`pyprego.plot_spat_model()`	Spatial model visualization
(no direct R equivalent)	`pyprego.plot_regression_prediction()`	Prediction scatter plot
(no direct R equivalent)	`pyprego.plot_regression_qc()`	Combined QC plot

Genomic Integration¶

R function	Python function	Notes
`intervals_to_seq()`	`pyprego.intervals_to_seq()`	Requires pymisha
`gextract_pwm()`	`pyprego.gextract_pwm()`	Requires pymisha
`gextract_local_pwm()`	`pyprego.gextract_local_pwm()`	Requires pymisha
`gextract_pwm_quantile()`	`pyprego.gextract_pwm_quantile()`	Requires pymisha
`gintervals_center_by_pssm()`	`pyprego.gintervals_center_by_pssm()`	Requires pymisha

Export/Import¶

R function	Python function	Notes
`export_regression_model()`	`pyprego.export_regression_model()`	JSON serialization
`load_regression_model()`	`pyprego.load_regression_model()`	JSON deserialization
`export_multi_regression()`	`pyprego.export_multi_regression()`	Multi-motif export
`load_multi_regression()`	`pyprego.load_multi_regression()`	Multi-motif import

Data Structure Mapping¶

R	Python	Notes
Named numeric matrix (PSSM)	`pd.DataFrame` with columns `pos`, `A`, `C`, `G`, `T`	Use `pyprego.pssm_dataframe()` to create
Named numeric vector (spatial)	`pd.DataFrame` with columns `bin`, `spat_factor`	Use `pyprego.spatial_dataframe()` to create
S4 `MotifDB` object	`pyprego.MotifDB` class	Same stacked-matrix internal representation
Named list (regression result)	`pyprego.RegressionResult` dataclass	Has a `.predict()` method
Named list (multi result)	`pyprego.MultiRegressionResult` dataclass	Has `.predict()` and `.predict_multi()` methods

Parameter Name Differences¶

Most parameters use the same names as the R package, with underscores replacing dots (Python convention):

R parameter	Python parameter
`motif.length`	`motif_length`
`score.metric`	`score_metric`
`spat.bin.size`	`spat_bin_size`
`spat.num.bins`	`spat_num_bins`
`improve.epsilon`	`improve_epsilon`
`min.nuc.prob`	`min_nuc_prob`
`unif.prior`	`unif_prior`
`num.folds`	`num_folds`
`log.energy`	`log_energy`
`optimize.pwm`	`optimize_pwm`
`optimize.spat`	`optimize_spat`

Numerical Equivalence¶

pyprego aims for close numerical agreement with the R prego package. Minor floating-point differences may occur due to:

Different random number generators (NumPy vs R's RNG).
Different linear algebra backends (LAPACK implementations).
Slightly different convergence paths in the coordinate-descent optimizer.

In practice, the discovered motifs are functionally equivalent: the same consensus sequences, comparable R-squared values, and the same biological interpretation.