Find dynamic peaks in McPeaks matrix — identify_dynamic

This function identifies "dynamic" peaks, i.e. those that have high expression only in a subset of the cells. They are identified by overdispersion in the coefficient of variation (std.dev./mean) per quantiles.

Usage

identify_dynamic_peaks(
  atac_mc,
  method = "bmq",
  plot = TRUE,
  mean_thresh_q = 0.1,
  cov_q_thresh = 0.75,
  num_bins = 200,
  gmm_g = 4
)

Arguments

atac_mc: the McPeaks object to analyze
method: (optional) either 'bmq' (default) or 'gmm'; 'bmq' (binned-mean quantiles) bins the log-mean of all peaks (averaged across metacells) and selects all peaks with a coefficient of variation above some quantile in each bin. More controlled 'gmm' fits a Gaussian mixture model to the log10(COV) vs. log10(mean) distribution, and selects peaks in clusters that show overdispersion in the COV.
plot: plot the peak mean vs coefficient of variation (both in log10 scale). Note that it is highly recommended to look at the scatter plot before proceeding, so set this parameter to FALSE only after you made sure that the scatter looks reasonable.
mean_thresh_q: (optional) threshold quantile on peaks' mean
cov_q_thresh: (optional) threshold on minimum COV quantile to consider as dynamic in each bin
num_bins: (optional) number of bins to divide features' means into
gmm_g: (optional) number of groups for 'gmm'

Value

a PeakIntervals object with peaks identified as dynamic. If plot = TRUE the selected points would plotted.

Examples

if (FALSE) {
dynamic_peaks_by_bmq <- identify_dynamic_peaks(atac_mc, method = "bmq", mean_thresh_q = 0.1, cov_q_thresh = 0.6, num_bins = 100)
dynamic_peaks_by_gmm <- identify_dynamic_peaks(atac_mc, method = "gmm", gmm_g = 3)
}