Extract pwm of intervals from a motif database — gextract_pwm

Extract the pwm of each interval for each motif from a motif database. gextract_pwm_old is an older version of this function, which is slower, and returns slightly different results due to float percision instead of double.

Usage

gextract_pwm_old(
  intervals,
  motifs = NULL,
  dataset = all_motif_datasets(),
  spat = NULL,
  spat_min = 1,
  spat_max = NULL,
  bidirect = TRUE,
  prior = 0.01,
  func = "logSumExp",
  parallel = getOption("prego.parallel", TRUE)
)

gextract_pwm(
  intervals,
  motifs = NULL,
  dataset = MOTIF_DB,
  spat = NULL,
  spat_min = 1,
  spat_max = NULL,
  bidirect = TRUE,
  prior = 0.01,
  func = "logSumExp",
  parallel = getOption("prego.parallel", TRUE)
)

Arguments

intervals: misha intervals set
motifs: names of specific motifs to extract from the dataset
dataset: a data frame with PSSMs ('A', 'C', 'G' and 'T' columns), with an additional column 'motif' containing the motif name, for example HOMER_motifs or JASPAR_motifs, or all_motif_datasets(), or a MotifDB object.
spat: a data frame with the spatial model (as returned from the $spat slot from the regression). Should contain a column called 'bin' and a column called 'spat_factor'.
spat_min: the minimum position to use from the sequences. The default is 1.
spat_max: the maximum position to use from the sequences. The default is the length of the sequences.
bidirect: is the motif bi-directional. If TRUE, the reverse-complement of the motif will be used as well.
prior: a prior probability for each nucleotide.
func: the function to use to combine the PWMs for each sequence. Either 'logSumExp' or 'max'. The default is 'logSumExp'.
parallel: logical, whether to use parallel processing

Value

The intervals set with additional columns per motif, containing the pwm of each interval for each motif

Examples

if (FALSE) { # \dontrun{
library(misha)
gdb.init_examples()
pwms <- gextract_pwm(gintervals.load("annotations"))
pwms[, 1:20]
} # }