Cluster metacells based on atac profiles using the k-means algorithm

Usage

gen_atac_mc_clust(
  atac_mc,
  use_prior_annot = TRUE,
  k = NULL,
  annot = "cell_type",
  ...
)

Arguments

atac_mc

an McPeaks object

use_prior_annot

(optional) when TRUE - use the metacell annotation to generate metacell clusters. Clusters would be generated based on a categorical field annot from the metadata slot in the McPeaks object.

k

(optional, when use_prior_annot == F) number of clusters to generate

annot

name of the field to use when use_prior_annot == T.

...

Arguments passed on to tglkmeans::TGL_kmeans

df: a data frame or a matrix. Each row is a single observation and each column is a dimension. the first column can contain id for each observation (if id_column is TRUE), otherwise the rownames are used.
metric: distance metric for kmeans++ seeding. can be 'euclid', 'pearson' or 'spearman'
max_iter: maximal number of iterations
min_delta: minimal change in assignments (fraction out of all observations) to continue iterating
verbose: display algorithm messages
keep_log: keep algorithm messages in 'log' field
id_column: df's first column contains the observation id
reorder_func: function to reorder the clusters. operates on each center and orders by the result. e.g. reorder_func = mean would calculate the mean of each center and then would reorder the clusters accordingly. If reorder_func = hclust the centers would be ordered by hclust of the euclidean distance of the correlation matrix, i.e. hclust(dist(cor(t(centers)))) if NULL, no reordering would be done.
hclust_intra_clusters: run hierarchical clustering within each cluster and return an ordering of the observations.
seed: seed for the c++ random number generator
parallel: cluster every cluster parallelly (if hclust_intra_clusters is true)
use_cpp_random: use c++ random number generator instead of R's. This should be used for only for backwards compatibility, as from version 0.4.0 onwards the default random number generator was changed o R.

Value

a named numeric vector specifying the cluster for each metacell

Examples

if (FALSE) {
## Use "default clustering" - the existing annotations
mc_clusters <- gen_atac_mc_clust(my_atac_mc, use_prior_annot = T)

## Identify peaks of interest, namely peaks neighboring a set of feature genes, and use only them for clustering
nei_peaks_feat_genes <- gintervals.neighbors(my_atac_mc@peaks, tss[tss$name %in% feature_genes, ], maxdist = 5e+5)
peaks_of_interest <- nei_peaks_feat_genes[, c("chrom", "start", "end")]
mc_clusters <- gen_atac_mc_clust(my_atac_mc, k = 16, peak_set = peaks_of_interest, use_prior_annot = F)
}