Analysis • mcATAC

library(mcATAC)
library(ggplot2)
ggplot2::theme_set(ggplot2::theme_classic())

Download fragments data

We will start by downloading the fragments data (and it’s index) from 10x:

if (!dir.exists("pbmc_data") || !file.exists("pbmc_data/fragments.tsv.gz") || !file.exists("pbmc_data/fragments.tsv.gz.tbi")) {
    download_pbmc_example_data(fragments = TRUE)
}

This would download the fragments file and its index from:

PBMC from a healthy donor - granulocytes removed through cell sorting (10k)

Note that the ‘fragments’ file is large - ~2GB, so this step might take some time.

Import raw reads into a ScCounts format

mcATAC allows you to load the raw counts data into an ScCounts object. A ScCounts holds the raw reads as a set of sparse matrices where the rows are genomic coordinates and the columns are cell barcodes. The number of sparse matrices (“genomic bins”) is controlled by the bin_size parameter. The matrices are stored in a ‘gzip’ compressed form under a directory called ‘data’, together with a ‘yaml’ file containing the some metadata such as the genomic bins coordinates and the cell barcodes.

We can now import the raw reads into a ScCounts object that we are going to store at “pbmc_data/reads”. Note that this might take ~4 minutes on a machine with 96 cores.

barcodes <- tgutil::fread("pbmc_data/barcodes.csv", header = FALSE)[, 1]
scc_from_fragments("pbmc_data/fragments.tsv.gz", "pbmc_data/reads", cell_names = barcodes, genome = "hg38", overwrite = TRUE)

If you have multiple batches, please add a suffix to the cell names using the cell_suffix parameter, e.g.:

scc_from_fragments("pbmc_data/fragments.tsv.gz", "pbmc_data/reads", cell_names = barcodes, cell_suffix = "batch1", genome = "hg38", overwrite = TRUE)

Load ScCounts object

sc_counts <- scc_read("pbmc_data/reads")
sc_counts

Project RNA metacells

We can now project metacells derived from the RNA data (or any other way) into an McCounts object which would hold the counts per metacell:

data(cell_to_metacell_pbmc_example)
mc_counts <- scc_to_mcc(sc_counts, cell_to_metacell_pbmc_example)
mc_counts

Save McCounts object

mcc_write(mc_counts, "pbmc_data/mc_counts", overwrite = TRUE)

Create ‘misha’ tracks for each metacell

We can now create a ‘misha’ track for each metacell:

mct <- mcc_to_tracks(mc_counts, "pbmc_mc", overwrite = TRUE, create_marginal_track = TRUE, window_size = NULL, resolution = 20)
mct

The window size paramter controls the smoothing of the tracks. If it set to NULL, the tracks will not be smoothed, except for the resolution.

We can now use the tracks we have created to identify candidate regularoty elements (CREs) for ‘trans’ analysis, or to analyse the genomic regions in ‘cis’.

Identify CREs (candidate regulatory elements)

We can now use the normalized marginal track we created in order to identify CREs:

cres <- call_peaks("pbmc_mc.marginal_normed", quantile_thresh = 0.9, min_umis = 8, target_size = 300, genome = "hg38")

We can diagnose our peak calling by plotting the marginal track:

plot_marginal_coverage("pbmc_mc.marginal_normed", interval = cres[967, ], window_size = 500, peaks = cres, expand = 1e4, show_thresh = TRUE, quantile_thresh = 0.9, min_umis = 8, genome = "hg38", log_scale = FALSE)

Create a McPeaks object from McCounts object

After we have indentified the CREs, we can create a McPeaks object from the McCounts object that we have created.

atac_mc <- mcc_to_mcatac(mc_counts, cres)
atac_mc

Create a McTracks object

We already created a ‘McTracks’ object when we created the tracks from the counts object, but we can also create it without re-generating the tracks:

mct <- mct_create(genome = "hg38", track_prefix = "pbmc_mc", id = mc_counts@id, description = mc_counts@description)

Add RNA and metadata to the McTracks object

We can now add the RNA and metadata to the McTracks object:

data(rna_mc_mat)
mct <- add_mc_rna(mct, rna_mc_mat)

An important metadata field is the ‘cell_type’ field. In addition, a field named ‘color’ can be added with a color for each cell type.

data(mcmd)
head(mcmd)
mct <- add_mc_metadata(mct, mcmd)

Plot a genomic region

mct_plot_region(mct, gintervals(5, 54513252, 55535253), detect_dca = TRUE, gene_annot = TRUE)

Create a shiny app

run_app(mct)

Deploy the shiny app

create_bundle(mct, path = "/path/to/the/server/pbmc_app", overwrite = TRUE, restart = TRUE)