Extracts average methylation data from tracks.
gpatterns.get_avg_meth(tracks, intervals, iterator = NULL, min_cov = NULL, mask_by_cov = FALSE, use_cpgs = FALSE, min_samples = NULL, min_cpgs = NULL, min_var = NULL, var_quantile = NULL, min_range = NULL, names = NULL, tidy = TRUE, pre_screen = FALSE, use_disk = FALSE, file = NULL, intervals.set.out = NULL, sum_tracks = FALSE)
min_range
There are two main modes:
the 'tidy' option is very conveniet in terms of further analysis, but note that for large amount of data it may be too slow. The 'not tidy' version, on the other hand, returns only average methylation and not the raw 'meth' and 'unmeth' calls. In general, choose the mode according to the following guidelines:
use_disk == TURE
. Note that in general working with huge number of genomic regions is not useful, both in terms of performance (memory consumption, slow algorithms) and analysis (more 'noise'). A good practice is to select the genomic regions carefully, for example by requering minimal coverage (min_cov
) in minimal number of samples (min_samples
), minimal number of CpGs (min_cpgs)
, taking only the most variable regions (min_var
, var_quantile
) or by taking sets of annotated regioins (e.g. promoters, enhancers).
pre_screen = TRUE
. This would first filter the CpGs and only then exracts the methylation to memory.
To understand the concept of iterators and intervals, see gextract, and the misha
package in general.
The function works in the following way: for every interval in intervals
the function extracts the methylation calls in each iterator
interval
and calculates the average.
Beware the difference between intervals and
iterator: intervals parameter sets the global genomic scope of the function
(what part of the genome to look at to begin with).
iterator
parameter sets the iterator intervals, which are the chunks of the genome form which we will extract the methylation calls.
For example setting the iterator to gintervals.all() would calculate the average methylation of every chromosome, whereas setting the intervals to
gintervals.all() would just mean that the calculations of the iterator
intervals would not be limited to a specific part of the genome, and, for
example, if iterator=NULL, methylation would be extracted from all the
genomic CpGs.