Calculate Kolmogorov-Smirnov D statistics between two interval sets with motif energies

This function does a one-sided KS test between a foreground set of peaks (pssm_fg) and a background set pssm_bg. The option alternative == "less", checks the null hypothesis that the foreground distribution is not less than the background distribution (applicable when looking for motif enrichment; for anti-enrichment, alternative == 'greater', see ks.test documentation for further details)

Usage

calculate_d_stats(
  pssm_fg,
  pssm_bg,
  fg_clustering = NULL,
  parallel = getOption("mcatac.parallel"),
  alternative = "less",
  nc = getOption("mcatac.parallel.nc")
)

Arguments

pssm_fg: motif energies calculated for a certain set of motifs on a PeakIntervals/ScPeaks/McPeaks object
pssm_bg: a background set of intervals (e.g. random genome, all ENCODE enhancers etc.) that include all/subset of the motifs (columns) in pssm_fg
fg_clustering: a vector of cluster assignments for the foreground peaks (e.g. from gen_atac_peak_clust)
parallel: (optional) - whether to use parallelize computations
alternative: indicates the alternative hypothesis and must be one of "two.sided" (default), "less", or "greater". You can specify just the initial letter of the value, but the argument name must be given in full. See ‘Details’ for the meanings of the possible values.
nc: (optional) - number of cores for parallel computations

Value

if fg_clustering == TRUE, returns a matrix of clusters x motifs (rows x columns) with the D-statistic for each combination

Examples

if (FALSE) {
pssm_fg <- generate_motif_pssm_matrix(my_atac_mc, datasets_of_interest = "jaspar")
pssm_bg <- gen_random_genome_peak_motif_matrix(num_peaks = nrow(my_atac_mc@peaks), datasets_of_interest = "jaspar")
d_vs_rg <- calculate_d_stats(pssm_fg, pssm_bg)
peak_clust <- gen_atac_peak_clust(my_atac_mc, k = 12)
d_vs_rg_cl <- calculate_d_stats(pssm_fg, pssm_bg, fg_clustering = peak_clust)
}