Shaman package impelements functions for resampling Hi-C matrices in order to generate expected contact distributions given constraints on marginal coverage and contact-distance probability distributions. The package also provides support for visualizing normalized matrices and statistical analysis of contact distributions around selected landmarks. It is an R package embedding algorithms implemened in C++, which is built over support from the Tanay’s lab Misha genomic database system that is provided with the package.
The normalization workflow consists of the following steps:
In the example misha database provided in this package we have created a low-footprint matrix to examplify the shaman workflow. We included 4.6 million contacts from ELA K562 dataset covering the hoxd locus (chr2:175e06-178e06) and convergent CTCF regions. Processing the complete matrix from this study requires downloading the full contact list and regenerating the reshuffled matrix.
Source code can be found at: https://github.com/tanaylab/shaman
The quickest way to install shaman is to use the following command:
remotes::install_github('tanaylab/shaman')
If remotes fails to install all the bioconductor requirments please install Gviz and GenomeInfoDb manually from bioconductor:
source("https://bioconductor.org/biocLite.R")
biocLite("Gviz")
biocLite("GenomeInfoDb")
In order to install from source, please take the following steps:
install.packages("http://www.wisdom.weizmann.ac.il/~nettam/shaman/misha_3.5.6.tar.gz", , repos=NULL) # Download and install misha package)
source("https://bioconductor.org/biocLite.R") #installing Gviz
biocLite("Gviz")
biocLite("GenomeInfoDb")
install.packages("http://www.wisdom.weizmann.ac.il/~nettam/shaman/shaman_2.0.tar.gz", , repos=NULL) # Download and install shaman package)
Please refer to https://tanaylab.github.io/shaman/articles/shaman-package.html for usage and workflow.