DNA methylation landscapes of 1538 breast cancers reveal a replication-linked clock, epigenomic instability and cis-regulation
2021-08-10
Following is the code that generates the figures for the METABRIC RRBS paper. The code is splitted to jupyter notebooks that can be found at: https://github.com/tanaylab/metabric_rrbs
1.1 Run the notebooks
Due to the size of the METABRIC-RRBS dataset (~2.2TB full, 55GB pileup alone), we generated a few smaller processed files to help reproduce the analysis. Even this bundle is quite large (~50GB), and you can download it from:
https://metabric-rrbs.s3.eu-west-1.amazonaws.com/analysis_files.tar.gz
The above file contains a folder named db
and a folder named data
. Please copy them to the main folder before running the notebooks.
See files.md
for a description of these files and pipeline.Rmd
for the code that generated them.
1.2 Dependencies
The initialization script (scripts/init.R
) installs automatically the necessary R packages to run the notebooks.
1.3 Notebook order
It is recommended to run the notebooks in the following order:
- coverage-stats
- TME
- Epigenomic-scores
- Loss-clock
- epigenomic-instability
- CNA
- mutations
- survival
- cis
- dosage-compensation
Running in a different order might work for some of the notebooks, but it might fail for others due to dependencies to data that was generated in previous notebooks.
1.4 Find a specific figure
figures-key.md
lists where each figure in the paper was generated.