Extract quantiles of pwm of intervals from a motif database
gextract_pwm.quantile.Rd
Extract for each interval its quantile in the genome for each motif given its length. Note that the quantiles are computed for each motif separately, and therefore this might be slow for intervals with un-normalized lengths.
Usage
gextract_pwm.quantile(
intervals,
motifs = NULL,
dataset = MOTIF_DB,
percision = 0.01,
spat = NULL,
spat_min = 1,
spat_max = NULL,
bidirect = TRUE,
prior = 0.01,
func = "logSumExp",
n_sequences = 10000,
dist_from_edge = 3000000,
chromosomes = NULL,
parallel = getOption("prego.parallel", TRUE)
)
Arguments
- intervals
misha intervals set
- motifs
names of specific motifs to extract from the dataset
- dataset
a data frame with PSSMs ('A', 'C', 'G' and 'T' columns), with an additional column 'motif' containing the motif name, for example
HOMER_motifs
orJASPAR_motifs
, orall_motif_datasets()
, or a MotifDB object.- percision
the percision of the quantiles. Default is 0.01, which means that the quantiles will be computed for every 1% of the pwm.
- spat
a data frame with the spatial model (as returned from the
$spat
slot from the regression). Should contain a column called 'bin' and a column called 'spat_factor'.- spat_min
the minimum position to use from the sequences. The default is 1.
- spat_max
the maximum position to use from the sequences. The default is the length of the sequences.
- bidirect
is the motif bi-directional. If TRUE, the reverse-complement of the motif will be used as well.
- prior
a prior probability for each nucleotide.
- func
the function to use to combine the PWMs for each sequence. Either 'logSumExp' or 'max'. The default is 'logSumExp'.
- n_sequences
number of sequences to sample in order to compute the quantiles. The default is 1e4.
- dist_from_edge
The minimum distance from the edge of the chromosome for a region to start or end(default: 3e6)
- chromosomes
The chromosomes to sample from (default: all chromosomes)
- parallel
logical, whether to use parallel processing
Value
a data frame with the quantiles of the pwm for each interval and motif. The quantiles columns would be of the form {motif}.q
Examples
if (FALSE) { # \dontrun{
library(misha)
gdb.init_examples()
gextract_pwm.quantile("annotations", motifs = c("JASPAR.CDX1", "JASPAR.CDX2"), dist_from_edge = 100)
} # }