R/shaman.R
shaman_score_hic_track.Rd
shaman_score_hic_track
shaman_score_hic_track(
track_db,
work_dir,
score_track_nm,
obs_track_nms,
exp_track_nms = paste0(obs_track_nms, "_shuffle"),
points_track_nms = obs_track_nms,
near_cis = 5000000,
expand = 2000000,
k = 100,
max_jobs = 100
)
Directory of the misha database.
Centralized directory to store temporary files.
Score track that will be created.
Names of observed 2D genomic tracks for the hic data. Pooling of multiple observed tracks is supported.
Names of expected (shuffled) 2D genomic tracks. Pooling of multiple expected tracks is supported.
Names of 2D genomic tracks that contain points on which to compute normalized score. Pooling points from multiple tracks is supported.
Size of matrix in grid.
Size of expansion, points to include outside the matrix for accurate computing of the score. Note that for each observed point, its k-nearest neighbors must be included in the expanded matrix.
The number of neighbor distances used for the score. For higher resolution maps, increase k. For lower resolution maps, decrease k.
Maximal number of qsub jobs.
This function generates a 2D score track based on observed and expected hic data. The score is computed by generating a grid of small matrices spanning all chromosomes and computing the score of each matrix independantly. The model for computing the score relies on the KS D statistic computed for each observed point, over the distances of the k-nearest neighbors in the observed compared to the expected. High scores represent contact enrichment while low scores depict insulation. Note that this function requires either sge (qsub) or multicore to compute in a timely manner. Parameters can be set via shaman.sge_support or shaman.mc_support in shaman.conf file. Score computation on 1 billion reads on a distributed system may take 4-10 hours (with default parameters), depending on the number of cores available.
Each step creates temporary files of the matrix scores which are then joined to a track. Temporary files are deleted upon track creation.
# The example below runs on the test misha db provided with shaman.
# Note that this is a toy db sampled from K562 ela data -
# scoring based on the observed and expected tracks will not produce the score track,
# as most of the genome is missing (you will see message: number of points in focus interval < 1000)
# options(shaman.sge_support=1) #configuring sge engine mode - preferred
options(shaman.mc_support = 1) # configuring multi-core mode
if (gtrack.exists("hic_score_new")) {
gtrack.rm("hic_score_new", force = TRUE)
gdb.reload()
}
ret <- shaman_score_hic_track(shaman_get_test_track_db(),
work_dir = tempdir(), # this can be set only in multi-core mode. For sge mode, work_dir must be accessible by all jobs.
score_track_nm = "hic_score_new",
obs_track_nms = "hic_obs",
exp_track_nms = "hic_exp",
near_cis = 1e09, # this test db contains very little data, can increase the size of each job
max_jobs = parallel::detectCores()
) # increase number of jobs for optimal runtime when running in sge mode
#> Error in setwd(groot): cannot change working directory
gdb.reload()
gtrack.ls("hic_score_new") # new shuffled track that was created
#> character(0)