generate an expected hic track based on observed hic data — shaman_shuffle_hic

shaman_shuffle_hic_track

shaman_shuffle_hic_track(
  track_db,
  obs_track_nm,
  work_dir,
  exp_track_nm = paste0(obs_track_nm, "_shuffle"),
  max_jobs = 25,
  shuffle = 80,
  grid_small = 500000,
  grid_high = 1000000,
  grid_step_iter = 40,
  dist_resolution = NA,
  smooth = NA
)

Arguments

track_db: Directory of the misha database.
obs_track_nm: Name of observed 2D genomic track for the hic data.
work_dir: Centralized directory to store temporary files.
exp_track_nm: Name of expected 2D genomic track.
max_jobs: Maximal number of qsub or local jobs - for optimal performance provide the number of chromosomes.
shuffle: Average number of shuffling transitions for each observed point in the chromosomal contact matrix.
grid_small: Initial size of maximum distance between contact pairs consdered for switching
grid_high: Final size of maximum distance between contact pairs consdered for switching
grid_step_iter: Number of iterations in each grid size
dist_resolution: Number of bins in each log2 distance unit. If NA, value is determined based on observed data (recommended).
smooth: Number of bins to use for smoothing the MCMC target function: the decay curve. If NA, value is determined based on observed data (recommended).

Details

This function generates an expected 2D hic track based on observed hic data. Each chromosome is shuffled seperately, to generate an expected shuffled contact matrix Note that this function requires sge (qsub) or multicore to be enabled. Parameter can be set via shaman.sge_support or shaman.mc_support in shaman.conf file. Reshuffling of an entire dataset will require 7 hours per 1 billion reads on a machine with one core per chromosome.

Each step creates temporary files of the shuffled matrices which are then joined to a track. Temporary files are deleted upon track creation.

Examples


# The example below runs on the test misha db provided with shaman.
# Note that this is a toy db sampled from K562 ela data - shuffling the observed track will not produce the expected track.
# options(shaman.sge_support=1) #configuring sge engine mode - preferred
options(shaman.mc_support = 1) # configuring multi-core mode
if (gtrack.exists("hic_obs_shuffle")) {
    gtrack.rm("hic_obs_shuffle", force = TRUE)
    gdb.reload()
}
ret <- shaman_shuffle_hic_track(shaman::shaman_get_test_track_db(),
    obs_track_nm = "hic_obs",
    work_dir = tempdir(), # this can be set only in multi-core mode. For sge mode, work_dir must be accessible by all jobs.
    shuffle = 1, # default is set to 80
    grid_step_iter = 1, # default is set to 40
    max_jobs = parallel::detectCores()
) # optimally set to number of chromosomes
#> Error in setwd(groot): cannot change working directory
gdb.reload()
gtrack.ls("hic_obs_shuffle") # new shuffled track that was created
#> character(0)