Create a track from bam files.

Creates a track from bam files.

gpatterns.import_from_bam(bams, workdir = NULL, track = NULL,
  steps = "all", paired_end = TRUE, cgs_mask_file = NULL, trim = NULL,
  umi1_idx = NULL, umi2_idx = NULL, use_seq = FALSE, only_seq = FALSE,
  frag_intervs = NULL, maxdist = 0, rm_off_target = TRUE,
  add_chr_prefix = FALSE, bismark = FALSE, nbins = nrow(gintervals.all()),
  groot = GROOT, import_raw_tcpgs = FALSE, use_sge = FALSE,
  max_jobs = 400, parallel = getOption("gpatterns.parallel"),
  cmd_prefix = "", run_per_interv = TRUE, ...)

Arguments

bams: character vector with path of bam files
workdir: directory in which the files would be saved (please provide full path)
track: name of the track to generate
steps: steps of the pipeline to do. Possible options are: 'bam2tidy_cpgs', 'filter_dups', 'bind_tidy_cpgs', 'pileup', 'pat_freq', 'pat_cov'
paired_end: bam files are paired end, with R1 and R2 interleaved
cgs_mask_file: comma separated file with positions of cpgs to mask (e.g. MSP1 sticky ends). Needs to have chrom and start fields with the position of 'C' in the cpgs to mask
trim: trim cpgs that are --trim bp from the beginning/end of the read
umi1_idx: position of umi1 in index (0 based)
umi2_idx: position of umi2 in index (0 based)
use_seq: use UMI sequence (not only position) to filter duplicates
only_seq: use only UMI sequence (without positions) to filter duplicates
frag_intervs: intervals set of the fragments to change positions to.
maxdist: maximal distance from fragments
rm_off_target: if TRUE - remove reads with distance > maxdist from frag_intervs if FALSE - those reads would be left unchanged
add_chr_prefix: add "chr" prefix for chromosomes (in order to import to misha)
bismark: bam was aligned using bismark
nbins: number of genomic bins to separate the analysis.
groot: root of misha genomic database to save the tracks
import_raw_tcpgs: import raw tidy cpgs to misha (without filtering duplicates)
use_sge: use sun grid engine for parallelization
max_jobs: maximal number of jobs for sge parallelization
parallel: parallelize using threads (number of threads is determined by gpatterns.set_parallel)
cmd_prefix: prefix to run on 'system' commands (e.g. source ~/.bashrc)
run_per_interv: split run of bam2tidy_cpgs scripts separatly for each interval.
...: gpatterns.import_from_tidy_cpgs parameters

Value

if 'stats' is one of the steps - data frame with statistics. Otherwise none.

Arguments

Value

Contents