Select/filter gene features from using multiple statistics from the gstat table. All genes passing the selected thresholds are included

mcell_gset_filter_multi(
  gstat_id,
  gset_id,
  T_tot,
  T_top3,
  T_szcor = NULL,
  T_vm = NULL,
  T_niche = NULL,
  force_new = F,
  blacklist = c()
)

Arguments

gstat_id

the ID of the gstat object to use

gset_id

if this exists, the function will restrict the current genes in the set with genes matching the selected thresholds, if not, it will generate a new gene sets object with one set including all selected genes

T_tot

total down sampled coverage thresholds (genes with tot UMIs < T_tot are filtered out)

T_top3

threshold value for the third highest umi count for the gene (genes with top3<T_top3 are filtered out)

T_szcor

threshold value for the normalized size correlation statistic (only genes with sz_cor < T_szcor are selected). If you use this, consider values around -0.1 - but evaluate carefully your decision using the gstat empirical data

T_vm

the threshold value for the normalized var/mean (only genes with varmean > T_vm are selected) Recommended values are usually around 0.2, but this may vary with the data. Not recommended for datasets with hihgly heterogeneous cell sizes (e.g. in whole-organisms datasets)

T_niche

threshold value for the normalized niche score statistic (only genes with niche_norm > T_niche are selected). Recommended to use in combination with szcor to add genes with strongly restricted expression patterns. Consider using values around 0.05

force_new

will overwrite existing gene set object (gset_id) in the database if it exists

blacklist

option list of gene IDs to be excluded