R/gset_feat_select.r
mcell_gset_filter_multi.Rd
Select/filter gene features from using multiple statistics from the gstat table. All genes passing the selected thresholds are included
mcell_gset_filter_multi(
gstat_id,
gset_id,
T_tot,
T_top3,
T_szcor = NULL,
T_vm = NULL,
T_niche = NULL,
force_new = F,
blacklist = c()
)
the ID of the gstat object to use
if this exists, the function will restrict the current genes in the set with genes matching the selected thresholds, if not, it will generate a new gene sets object with one set including all selected genes
total down sampled coverage thresholds (genes with tot UMIs < T_tot are filtered out)
threshold value for the third highest umi count for the gene (genes with top3<T_top3 are filtered out)
threshold value for the normalized size correlation statistic (only genes with sz_cor < T_szcor are selected). If you use this, consider values around -0.1 - but evaluate carefully your decision using the gstat empirical data
the threshold value for the normalized var/mean (only genes with varmean > T_vm are selected) Recommended values are usually around 0.2, but this may vary with the data. Not recommended for datasets with hihgly heterogeneous cell sizes (e.g. in whole-organisms datasets)
threshold value for the normalized niche score statistic (only genes with niche_norm > T_niche are selected). Recommended to use in combination with szcor to add genes with strongly restricted expression patterns. Consider using values around 0.05
will overwrite existing gene set object (gset_id) in the database if it exists
option list of gene IDs to be excluded