Creates a new virtual track.
gvtrack.create(vtrack = NULL, src = NULL, func = NULL, params = NULL, ...)
None.
This function creates a new virtual track named 'vtrack' with the given source, function and parameters. 'src' can be either a track or intervals (1D or 2D). Use the following table for a reference of all valid source, function and parameters combinations:
src = [Track], func = "avg", params = NULL
Average track value in
iterator interval.
src = [Track], func = "max", params = NULL
Maximal track value in
iterator interval.
src = [Track], func = "min", params = NULL
Minimal track value in
iterator interval.
src = ['Dense' / 'Sparse' / 'Array' track], func = "nearest", params =
NULL
Mean track value in iterator interval. If there are no track
values covered by an iterator interator (can occur only in 'Sparse' track),
the nearest track value is returned.
src = ['Dense' / 'Sparse' / 'Array' track], func = "stddev", params =
NULL
Unbiased standard deviation of track values in iterator interval.
src = ['Dense' / 'Sparse' / 'Array' track], func = "sum", params =
NULL
Sum of track values in iterator interval.
src = ['Dense' / 'Sparse' / 'Array' track], func = "quantile", params
= [Percentile in the range of [0, 1]]
Quantile of track values in
iterator interval.
src = ['Dense' track], func = "global.percentile", params = NULL
Percentile of an average track value in iterator interval relatively to all
values of the track.
src = ['Dense' track], func = "global.percentile.max", params = NULL
Percentile of a maximal track value in iterator interval relatively to
all values of the track.
src = ['Dense' track], func = "global.percentile.min", params = NULL
Percentile of a minimal track value in iterator interval relatively to
all values of the track.
src = [2D track], func = "area", params = NULL
Area covered by
iterator interval.
src = [2D track], func = "weighted.sum", params = NULL
Weighted
sum of values where each weight equals to the intersection area between the
iterator interval and the rectangle containing the value.
src = [1D intervals], func = "distance", params = [Minimal distance
from center (default: 0)]
Given the center 'C' of the current iterator
interval returns 'DC * X/2', where 'DC' is the normalized distance to the
center of the interval that contains 'C', and 'X' is the value of the
parameter. If no interval contains 'C' the resulted value is 'D + XXX/2'
where 'D' is the distance between 'C' and the edge of the closest interval.
Distance can be positive or negative depending on the position of the
coordinate relative to the interval and the strand (-1 or 1) of the
interval. Distance is always positive if 'strand' is '0' or if 'strand'
column is missing. Distance is 'NA' if no intervals exist for the current
chromosome.
src = [1D intervals], func = "distance.center", params = NULL
Given the center 'C' of the current iterator interval returns 'NaN' if 'C'
is outside of the intervals, otherwise returns the distance between 'C' and
the center of the closest interval. Distance can be positive or negative
depending on the position of the coordinate relative to the interval and the
strand (-1 or 1) of the interval. Distance is always positive if 'strand' is
'0' or if 'strand' column is missing.
src = [1D intervals], func = "coverage", params = NULL
For each iterator interval, calculates the fraction of its length that is covered by the
source intervals. Returns a value between 0 and 1. For example, if an iterator interval is [100,200]
and the source intervals cover positions 120-140 and 160-170, the coverage would be 0.3
((20 + 10) / 100 = 0.3). Overlapping source intervals are first unified.
func = "pwm", params = list(pssm = matrix, bidirect = TRUE,
prior = 0.01, extend = TRUE)
Calculates total log-likelihood score of DNA sequence against PSSM.
Uses log-sum-exp over all positions. For bidirect=TRUE, scans both
strands. Prior adds pseudocounts, extend=TRUE allows scoring at boundaries.
func = "pwm.max", params = list(pssm = matrix, bidirect = TRUE,
prior = 0.01, extend = TRUE)
Returns maximum log-likelihood score of best PSSM match. bidirect=TRUE
checks both strands. Prior adds pseudocounts, extend=TRUE allows boundary
scoring.
func = "pwm.max.pos", params = list(pssm = matrix, bidirect = TRUE,
prior = 0.01, extend = TRUE)
Returns 1-based position of best PSSM match.
If bidirect=TRUE, the position would be positive if the best hit was at the
forward strand, and negative if it was at the reverse strand. When strand is
-1 the position is still according to the forward strand, but the hit is at
the end of the match.
Prior adds pseudocounts, extend=TRUE allows boundary scoring.
For all PWM functions:
pssm: Position-specific scoring matrix (A,C,G,T frequencies)
bidirect: If TRUE, scans both strands; if FALSE, forward only
prior: Pseudocount for frequencies (default: 0.01)
extend: If TRUE, computes boundary scores
strand: If 1, scans forward strand; if -1, scans reverse strand. For strand == 1, the energy (and position of the best match) would be at the beginning of the match, for strand == -1, the energy (and position of the best match) would be at the end of the match.
PWM parameters are accepted as list or individual parameters (see examples).
func = "kmer.count", params = list(kmer = "ACGT", extend = TRUE, strand = 0)
Counts occurrences of the specified kmer in each interval. The extend=TRUE
parameter (default) allows counting kmers that span interval boundaries.
The strand parameter can be 1 (forward strand), -1 (reverse strand), or 0 (both strands).
func = "kmer.frac", params = list(kmer = "ACGT", extend = TRUE, strand = 0)
Calculates the fraction of possible positions in each interval that contain
the specified kmer. The extend=TRUE parameter (default) allows counting kmers
that span interval boundaries. The strand parameter can be 1 (forward strand), -1
(reverse strand), or 0 (both strands).
For kmer functions:
kmer: The DNA sequence to count (case-insensitive)
extend: If TRUE, counts kmers that span interval boundaries
strand: If 1, counts kmers on forward strand; if -1, counts kmers on reverse strand. If 0, counts kmers on both strands. Default is 0.
Kmer parameters are accepted as list or individual parameters (see examples). Note that for palindromic kmers, setting strand to 1 or -1 is recommended to avoid double counting.
Modify iterator behavior with 'gvtrack.iterator' or 'gvtrack.iterator.2d'.
gdb.init_examples()
gvtrack.create("vtrack1", "dense_track", "max")
gvtrack.create("vtrack2", "dense_track", "quantile", 0.5)
gextract("dense_track", "vtrack1", "vtrack2",
gintervals(1, 0, 10000),
iterator = 1000
)
#> chrom start end dense_track vtrack1 vtrack2 intervalID
#> 1 chr1 0 1000 0.08988887 0.20 0.06 1
#> 2 chr1 1000 2000 0.04100000 0.26 0.02 1
#> 3 chr1 2000 3000 0.04100000 0.10 0.04 1
#> 4 chr1 3000 4000 0.04100000 0.14 0.04 1
#> 5 chr1 4000 5000 0.04300000 0.18 0.04 1
#> 6 chr1 5000 6000 0.04000000 0.10 0.04 1
#> 7 chr1 6000 7000 0.05799999 0.22 0.04 1
#> 8 chr1 7000 8000 0.02400000 0.06 0.02 1
#> 9 chr1 8000 9000 0.03300000 0.10 0.03 1
#> 10 chr1 9000 10000 0.06899999 0.18 0.07 1
gvtrack.create("vtrack3", "dense_track", "global.percentile")
gvtrack.create("vtrack4", "annotations", "distance")
gdist(
"vtrack3", seq(0, 1, l = 10), "vtrack4",
seq(-500, 500, 200)
)
#> (-500,-300] (-300,-100] (-100,100] (100,300] (300,500]
#> (0,0.111111] 2 5 77 7 8
#> (0.111111,0.222222] 0 0 0 0 0
#> (0.222222,0.333333] 5 7 36 9 7
#> (0.333333,0.444444] 5 2 32 3 1
#> (0.444444,0.555556] 2 2 24 2 2
#> (0.555556,0.666667] 4 1 29 6 2
#> (0.666667,0.777778] 0 2 29 2 4
#> (0.777778,0.888889] 0 1 35 0 1
#> (0.888889,1] 2 3 18 2 3
#> attr(,"breaks")
#> attr(,"breaks")[[1]]
#> [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
#> [8] 0.7777778 0.8888889 1.0000000
#>
#> attr(,"breaks")[[2]]
#> [1] -500 -300 -100 100 300 500
#>
gvtrack.create("cov", "annotations", "coverage")
gextract("cov", gintervals(1, 0, 1000), iterator = 100)
#> chrom start end cov intervalID
#> 1 chr1 0 100 0.8 1
#> 2 chr1 100 200 1.0 1
#> 3 chr1 200 300 1.0 1
#> 4 chr1 300 400 1.0 1
#> 5 chr1 400 500 1.0 1
#> 6 chr1 500 600 1.0 1
#> 7 chr1 600 700 1.0 1
#> 8 chr1 700 800 1.0 1
#> 9 chr1 800 900 1.0 1
#> 10 chr1 900 1000 1.0 1
pssm <- matrix(
c(
0.7, 0.1, 0.1, 0.1, # Example PSSM
0.1, 0.7, 0.1, 0.1,
0.1, 0.1, 0.7, 0.1,
0.1, 0.1, 0.7, 0.1,
0.1, 0.1, 0.7, 0.1,
0.1, 0.1, 0.7, 0.1
),
ncol = 4, byrow = TRUE
)
colnames(pssm) <- c("A", "C", "G", "T")
gvtrack.create(
"motif_score", NULL, "pwm",
list(pssm = pssm, bidirect = TRUE, prior = 0.01)
)
gvtrack.create("max_motif_score", NULL, "pwm.max",
pssm = pssm, bidirect = TRUE, prior = 0.01
)
gvtrack.create("max_motif_pos", NULL, "pwm.max.pos",
pssm = pssm
)
gextract(
c(
"dense_track", "motif_score", "max_motif_score",
"max_motif_pos"
),
gintervals(1, 0, 10000),
iterator = 500
)
#> chrom start end dense_track motif_score max_motif_score max_motif_pos
#> 1 chr1 0 500 0.1577778 -1.7766590 -5.326688 -147
#> 2 chr1 500 1000 0.0220000 -0.9113574 -4.131330 178
#> 3 chr1 1000 1500 0.0540000 -1.3073393 -4.131330 32
#> 4 chr1 1500 2000 0.0280000 -1.5331398 -4.131330 203
#> 5 chr1 2000 2500 0.0420000 -1.4151764 -4.151338 -127
#> 6 chr1 2500 3000 0.0400000 -0.9677283 -2.289690 -135
#> 7 chr1 3000 3500 0.0320000 -1.3787552 -4.151338 22
#> 8 chr1 3500 4000 0.0500000 -1.2041827 -4.151338 4
#> 9 chr1 4000 4500 0.0360000 -1.0488071 -2.289690 -39
#> 10 chr1 4500 5000 0.0500000 -0.9755251 -4.131330 -55
#> 11 chr1 5000 5500 0.0460000 -1.4299518 -4.131330 11
#> 12 chr1 5500 6000 0.0340000 -1.2019978 -4.131330 62
#> 13 chr1 6000 6500 0.0800000 -1.6382426 -4.151338 152
#> 14 chr1 6500 7000 0.0360000 -1.1377980 -4.131330 20
#> 15 chr1 7000 7500 0.0220000 -1.0761293 -4.131330 8
#> 16 chr1 7500 8000 0.0260000 -1.0666249 -4.131330 174
#> 17 chr1 8000 8500 0.0320000 -0.9999036 -2.289690 75
#> 18 chr1 8500 9000 0.0340000 -0.9541358 -2.289690 189
#> 19 chr1 9000 9500 0.0580000 -0.8253071 -2.289690 450
#> 20 chr1 9500 10000 0.0800000 -1.5154103 -4.151338 -248
#> intervalID
#> 1 1
#> 2 1
#> 3 1
#> 4 1
#> 5 1
#> 6 1
#> 7 1
#> 8 1
#> 9 1
#> 10 1
#> 11 1
#> 12 1
#> 13 1
#> 14 1
#> 15 1
#> 16 1
#> 17 1
#> 18 1
#> 19 1
#> 20 1
# Kmer counting examples
gvtrack.create("cg_count", NULL, "kmer.count", kmer = "CG", strand = 1)
gvtrack.create("cg_frac", NULL, "kmer.frac", kmer = "CG", strand = 1)
gextract(c("cg_count", "cg_frac"), gintervals(1, 0, 10000), iterator = 1000)
#> chrom start end cg_count cg_frac intervalID
#> 1 chr1 0 1000 73 0.07307307 1
#> 2 chr1 1000 2000 27 0.02702703 1
#> 3 chr1 2000 3000 23 0.02302302 1
#> 4 chr1 3000 4000 15 0.01501502 1
#> 5 chr1 4000 5000 27 0.02702703 1
#> 6 chr1 5000 6000 25 0.02502503 1
#> 7 chr1 6000 7000 18 0.01801802 1
#> 8 chr1 7000 8000 25 0.02502503 1
#> 9 chr1 8000 9000 20 0.02002002 1
#> 10 chr1 9000 10000 24 0.02402402 1
gvtrack.create("at_pos", NULL, "kmer.count", kmer = "AT", strand = 1)
gvtrack.create("at_neg", NULL, "kmer.count", kmer = "AT", strand = -1)
gvtrack.create("at_both", NULL, "kmer.count", kmer = "AT", strand = 0)
#> Warning: kmer sequence 'AT' is palindromic, please set strand to 1 or -1 to avoid double counting
gextract(c("at_pos", "at_neg", "at_both"), gintervals(1, 0, 10000), iterator = 1000)
#> chrom start end at_pos at_neg at_both intervalID
#> 1 chr1 0 1000 4 4 8 1
#> 2 chr1 1000 2000 47 47 94 1
#> 3 chr1 2000 3000 28 28 56 1
#> 4 chr1 3000 4000 33 33 66 1
#> 5 chr1 4000 5000 24 24 48 1
#> 6 chr1 5000 6000 19 19 38 1
#> 7 chr1 6000 7000 34 34 68 1
#> 8 chr1 7000 8000 31 31 62 1
#> 9 chr1 8000 9000 40 40 80 1
#> 10 chr1 9000 10000 32 32 64 1
# GC content
gvtrack.create("g_frac", NULL, "kmer.frac", kmer = "G")
gvtrack.create("c_frac", NULL, "kmer.frac", kmer = "C")
gextract("g_frac + c_frac", gintervals(1, 0, 10000),
iterator = 1000,
colnames = "gc_content"
)
#> chrom start end gc_content intervalID
#> 1 chr1 0 1000 0.629 1
#> 2 chr1 1000 2000 0.522 1
#> 3 chr1 2000 3000 0.603 1
#> 4 chr1 3000 4000 0.570 1
#> 5 chr1 4000 5000 0.591 1
#> 6 chr1 5000 6000 0.622 1
#> 7 chr1 6000 7000 0.557 1
#> 8 chr1 7000 8000 0.626 1
#> 9 chr1 8000 9000 0.587 1
#> 10 chr1 9000 10000 0.565 1