R Parity Reference
pyprego is a Python port of the prego R package. This page documents the mapping between R and Python functions.
Function Mapping
Core Regression
| R function |
Python function |
Notes |
regress_pwm() |
pyprego.regress_pwm() |
Full feature parity |
regress_multiple_motifs() |
pyprego.regress_multiple_motifs() |
Also accessible via regress_pwm(motif_num=N) |
regress_pwm_clusters() |
pyprego.regress_pwm_clusters() |
Per-cluster regression |
regress_pwm_cv() |
pyprego.regress_pwm_cv() |
Cross-validated regression |
PWM Scoring
| R function |
Python function |
Notes |
compute_pwm() |
pyprego.compute_pwm() |
logSumExp and max aggregation |
compute_local_pwm() |
pyprego.compute_local_pwm() |
Per-position scores |
K-mer Operations
| R function |
Python function |
Notes |
screen_kmers() |
pyprego.screen_kmers() |
Vectorized implementation |
generate_kmers() |
pyprego.generate_kmers() |
Including gapped k-mers |
kmer_matrix() |
pyprego.kmer_matrix() |
K-mer count matrix |
kmers_to_pssm() |
pyprego.kmers_to_pssm() |
K-mer to PSSM conversion |
PSSM Utilities
| R function |
Python function |
Notes |
pssm_cor() |
pyprego.pssm_cor() |
Best-alignment correlation |
pssm_diff() |
pyprego.pssm_diff() |
Best-alignment distance |
pssm_match() |
pyprego.pssm_match() |
Database matching |
pssm_trim() |
pyprego.pssm_trim() |
Trim low-info edges |
pssm_rc() |
pyprego.pssm_rc() |
Reverse complement |
bits_per_pos() |
pyprego.bits_per_pos() |
Information content |
consensus_from_pssm() |
pyprego.consensus_from_pssm() |
Consensus with IUPAC codes |
pssm_quantile() |
pyprego.pssm_quantile() |
Empirical score quantile |
pssm_dataset_cor() |
pyprego.pssm_dataset_cor() |
Dataset-level correlation |
pssm_dataset_diff() |
pyprego.pssm_dataset_diff() |
Dataset-level distance |
Motif Database
| R function |
Python function |
Notes |
create_motif_db() |
pyprego.create_motif_db() |
Build MotifDB from sequences |
extract_pwm() |
pyprego.extract_pwm() |
Extract PSSM from MotifDB |
screen_pwm() |
pyprego.screen_pwm() |
Score sequences against all motifs in a DB |
motif_enrichment() |
pyprego.motif_enrichment() |
Motif enrichment analysis |
all_motif_datasets() |
pyprego.all_motif_datasets() |
List bundled datasets |
get_motif_pssm() |
pyprego.get_motif_pssm() |
Retrieve PSSM by name from bundled dataset |
Visualization
| R function |
Python function |
Notes |
plot_pssm_logo() |
pyprego.plot_pssm_logo() |
Uses logomaker or bar-chart fallback |
| (no direct R equivalent) |
pyprego.plot_spat_model() |
Spatial model visualization |
| (no direct R equivalent) |
pyprego.plot_regression_prediction() |
Prediction scatter plot |
| (no direct R equivalent) |
pyprego.plot_regression_qc() |
Combined QC plot |
Genomic Integration
| R function |
Python function |
Notes |
intervals_to_seq() |
pyprego.intervals_to_seq() |
Requires pymisha |
gextract_pwm() |
pyprego.gextract_pwm() |
Requires pymisha |
gextract_local_pwm() |
pyprego.gextract_local_pwm() |
Requires pymisha |
gextract_pwm_quantile() |
pyprego.gextract_pwm_quantile() |
Requires pymisha |
gintervals_center_by_pssm() |
pyprego.gintervals_center_by_pssm() |
Requires pymisha |
Export/Import
| R function |
Python function |
Notes |
export_regression_model() |
pyprego.export_regression_model() |
JSON serialization |
load_regression_model() |
pyprego.load_regression_model() |
JSON deserialization |
export_multi_regression() |
pyprego.export_multi_regression() |
Multi-motif export |
load_multi_regression() |
pyprego.load_multi_regression() |
Multi-motif import |
Data Structure Mapping
| R |
Python |
Notes |
| Named numeric matrix (PSSM) |
pd.DataFrame with columns pos, A, C, G, T |
Use pyprego.pssm_dataframe() to create |
| Named numeric vector (spatial) |
pd.DataFrame with columns bin, spat_factor |
Use pyprego.spatial_dataframe() to create |
S4 MotifDB object |
pyprego.MotifDB class |
Same stacked-matrix internal representation |
| Named list (regression result) |
pyprego.RegressionResult dataclass |
Has a .predict() method |
| Named list (multi result) |
pyprego.MultiRegressionResult dataclass |
Has .predict() and .predict_multi() methods |
Parameter Name Differences
Most parameters use the same names as the R package, with underscores replacing dots (Python convention):
| R parameter |
Python parameter |
motif.length |
motif_length |
score.metric |
score_metric |
spat.bin.size |
spat_bin_size |
spat.num.bins |
spat_num_bins |
improve.epsilon |
improve_epsilon |
min.nuc.prob |
min_nuc_prob |
unif.prior |
unif_prior |
num.folds |
num_folds |
log.energy |
log_energy |
optimize.pwm |
optimize_pwm |
optimize.spat |
optimize_spat |
Numerical Equivalence
pyprego aims for close numerical agreement with the R prego package. Minor floating-point differences may occur due to:
- Different random number generators (NumPy vs R's RNG).
- Different linear algebra backends (LAPACK implementations).
- Slightly different convergence paths in the coordinate-descent optimizer.
In practice, the discovered motifs are functionally equivalent: the same consensus sequences, comparable R-squared values, and the same biological interpretation.