Screen for motifs in a database given a response variable
screen_pwm.RdScreen for motifs in a database given a response variable
Usage
screen_pwm(
sequences,
response,
metric = NULL,
dataset = all_motif_datasets(),
motifs = NULL,
parallel = getOption("prego.parallel", TRUE),
only_best = FALSE,
prior = 0.01,
alternative = "two.sided",
...
)Arguments
- sequences
a vector with the sequences
- response
a vector of response variable for each sequence. If the response is a matrix, the average will be used.
- metric
metric to use in order to choose the best motif. One of 'ks' or 'r2'. If NULL - the default would be 'ks' for binary variables, and 'r2' for continuous variables.
- dataset
a data frame with PSSMs ('A', 'C', 'G' and 'T' columns), with an additional column 'motif' containing the motif name, for example
HOMER_motifsorJASPAR_motifs, orall_motif_datasets(), or a MotifDB object.- motifs
names of specific motifs to extract from the dataset
- parallel
logical, whether to use parallel processing
- only_best
return only the best motif (the one with the highest score). If FALSE, all the motifs will be returned.
- prior
a prior probability for each nucleotide.
- alternative
alternative hypothesis for the KS test. One of 'two.sided', 'less' or 'greater'.
- ...
Arguments passed on to
compute_pwmpssma PSSM matrix or data frame. The columns of the matrix or data frame should be named with the nucleotides ('A', 'C', 'G' and 'T').
spata data frame with the spatial model (as returned from the
$spatslot from the regression). Should contain a column called 'bin' and a column called 'spat_factor'.spat_minthe minimum position to use from the sequences. The default is 1.
spat_maxthe maximum position to use from the sequences. The default is the length of the sequences.
bidirectis the motif bi-directional. If TRUE, the reverse-complement of the motif will be used as well.
functhe function to use to combine the PWMs for each sequence. Either 'logSumExp' or 'max'. The default is 'logSumExp'.
Value
a data frame with the following columns:
- motif:
the motif name.
- score:
the score of the motif (depending on
metric).
if only_best is TRUE, only the best motif would be returned (a data framw with a single row).
Examples
res_screen <- screen_pwm(cluster_sequences_example, cluster_mat_example[, 1])
#> ℹ Performing PWM screening
head(res_screen)
#> # A tibble: 6 x 2
#> motif score
#> 1 HOCOMOCO.HNF1B_HUMAN.H11MO.0.A 0.8606183
#> 2 HOCOMOCO.HNF1B_MOUSE.H11MO.0.A 0.8510730
#> 3 JASPAR.HNF1A 0.8510730
#> 4 JOLMA.HNF1A_di_full 0.8505374
#> 5 JOLMA.HNF1B_di_full_1 0.8484232
#> 6 JOLMA.HNF1B_di_full_2 0.8484090
# only best match
screen_pwm(cluster_sequences_example, cluster_mat_example[, 1])
#> ℹ Performing PWM screening
#> # A tibble: 3,867 x 2
#> motif score
#> 1 HOCOMOCO.HNF1B_HUMAN.H11MO.0.A 0.8606183
#> 2 HOCOMOCO.HNF1B_MOUSE.H11MO.0.A 0.8510730
#> 3 JASPAR.HNF1A 0.8510730
#> 4 JOLMA.HNF1A_di_full 0.8505374
#> 5 JOLMA.HNF1B_di_full_1 0.8484232
#> 6 JOLMA.HNF1B_di_full_2 0.8484090
#> # ... with 3,861 more rows
# with r^2 metric
res_screen <- screen_pwm(sequences_example, response_mat_example[, 1], metric = "r2")
#> ℹ Performing PWM screening
head(res_screen)
#> # A tibble: 6 x 2
#> motif score
#> 1 JASPAR.SOX2 0.04355104
#> 2 JASPAR.SUT1 0.04011911
#> 3 JASPAR.SOX13 0.03979196
#> 4 JOLMA.IRX3_di_DBD 0.03947399
#> 5 JASPAR.dsx 0.03903627
#> 6 JASPAR.Sox3 0.03876807