Creates a new virtual track.

gvtrack.create(vtrack = NULL, src = NULL, func = NULL, params = NULL, ...)

Arguments

vtrack

virtual track name

src

source (track/intervals). NULL for PWM functions

func

function name (see above)

params

function parameters (see above)

...

additional PWM parameters

Value

None.

Details

This function creates a new virtual track named 'vtrack' with the given source, function and parameters. 'src' can be either a track or intervals (1D or 2D). Use the following table for a reference of all valid source, function and parameters combinations:

src = [Track], func = "avg", params = NULL
Average track value in iterator interval.

src = [Track], func = "max", params = NULL
Maximal track value in iterator interval.

src = [Track], func = "min", params = NULL
Minimal track value in iterator interval.

src = ['Dense' / 'Sparse' / 'Array' track], func = "nearest", params = NULL
Mean track value in iterator interval. If there are no track values covered by an iterator interator (can occur only in 'Sparse' track), the nearest track value is returned.

src = ['Dense' / 'Sparse' / 'Array' track], func = "stddev", params = NULL
Unbiased standard deviation of track values in iterator interval.

src = ['Dense' / 'Sparse' / 'Array' track], func = "sum", params = NULL
Sum of track values in iterator interval.

src = ['Dense' / 'Sparse' / 'Array' track], func = "quantile", params = [Percentile in the range of [0, 1]]
Quantile of track values in iterator interval.

src = ['Dense' track], func = "global.percentile", params = NULL
Percentile of an average track value in iterator interval relatively to all values of the track.

src = ['Dense' track], func = "global.percentile.max", params = NULL
Percentile of a maximal track value in iterator interval relatively to all values of the track.

src = ['Dense' track], func = "global.percentile.min", params = NULL
Percentile of a minimal track value in iterator interval relatively to all values of the track.

src = [2D track], func = "area", params = NULL
Area covered by iterator interval.

src = [2D track], func = "weighted.sum", params = NULL
Weighted sum of values where each weight equals to the intersection area between the iterator interval and the rectangle containing the value.

src = [1D intervals], func = "distance", params = [Minimal distance from center (default: 0)]
Given the center 'C' of the current iterator interval returns 'DC * X/2', where 'DC' is the normalized distance to the center of the interval that contains 'C', and 'X' is the value of the parameter. If no interval contains 'C' the resulted value is 'D + XXX/2' where 'D' is the distance between 'C' and the edge of the closest interval. Distance can be positive or negative depending on the position of the coordinate relative to the interval and the strand (-1 or 1) of the interval. Distance is always positive if 'strand' is '0' or if 'strand' column is missing. Distance is 'NA' if no intervals exist for the current chromosome.

src = [1D intervals], func = "distance.center", params = NULL
Given the center 'C' of the current iterator interval returns 'NaN' if 'C' is outside of the intervals, otherwise returns the distance between 'C' and the center of the closest interval. Distance can be positive or negative depending on the position of the coordinate relative to the interval and the strand (-1 or 1) of the interval. Distance is always positive if 'strand' is '0' or if 'strand' column is missing.

src = [1D intervals], func = "coverage", params = NULL
For each iterator interval, calculates the fraction of its length that is covered by the source intervals. Returns a value between 0 and 1. For example, if an iterator interval is [100,200] and the source intervals cover positions 120-140 and 160-170, the coverage would be 0.3 ((20 + 10) / 100 = 0.3). Overlapping source intervals are first unified.

func = "pwm", params = list(pssm = matrix, bidirect = TRUE, prior = 0.01, extend = TRUE)
Calculates total log-likelihood score of DNA sequence against PSSM. Uses log-sum-exp over all positions. For bidirect=TRUE, scans both strands. Prior adds pseudocounts, extend=TRUE allows scoring at boundaries.

func = "pwm.max", params = list(pssm = matrix, bidirect = TRUE, prior = 0.01, extend = TRUE)
Returns maximum log-likelihood score of best PSSM match. bidirect=TRUE checks both strands. Prior adds pseudocounts, extend=TRUE allows boundary scoring.

func = "pwm.max.pos", params = list(pssm = matrix, bidirect = TRUE, prior = 0.01, extend = TRUE)
Returns 1-based position of best PSSM match. If bidirect=TRUE, the position would be positive if the best hit was at the forward strand, and negative if it was at the reverse strand. When strand is -1 the position is still according to the forward strand, but the hit is at the end of the match. Prior adds pseudocounts, extend=TRUE allows boundary scoring.

For all PWM functions:

  • pssm: Position-specific scoring matrix (A,C,G,T frequencies)

  • bidirect: If TRUE, scans both strands; if FALSE, forward only

  • prior: Pseudocount for frequencies (default: 0.01)

  • extend: If TRUE, computes boundary scores

  • strand: If 1, scans forward strand; if -1, scans reverse strand. For strand == 1, the energy (and position of the best match) would be at the beginning of the match, for strand == -1, the energy (and position of the best match) would be at the end of the match.

PWM parameters are accepted as list or individual parameters (see examples).

func = "kmer.count", params = list(kmer = "ACGT", extend = TRUE, strand = 0)
Counts occurrences of the specified kmer in each interval. The extend=TRUE parameter (default) allows counting kmers that span interval boundaries. The strand parameter can be 1 (forward strand), -1 (reverse strand), or 0 (both strands).

func = "kmer.frac", params = list(kmer = "ACGT", extend = TRUE, strand = 0)
Calculates the fraction of possible positions in each interval that contain the specified kmer. The extend=TRUE parameter (default) allows counting kmers that span interval boundaries. The strand parameter can be 1 (forward strand), -1 (reverse strand), or 0 (both strands).

For kmer functions:

  • kmer: The DNA sequence to count (case-insensitive)

  • extend: If TRUE, counts kmers that span interval boundaries

  • strand: If 1, counts kmers on forward strand; if -1, counts kmers on reverse strand. If 0, counts kmers on both strands. Default is 0.

Kmer parameters are accepted as list or individual parameters (see examples). Note that for palindromic kmers, setting strand to 1 or -1 is recommended to avoid double counting.

Modify iterator behavior with 'gvtrack.iterator' or 'gvtrack.iterator.2d'.

Examples


gdb.init_examples()

gvtrack.create("vtrack1", "dense_track", "max")
gvtrack.create("vtrack2", "dense_track", "quantile", 0.5)
gextract("dense_track", "vtrack1", "vtrack2",
    gintervals(1, 0, 10000),
    iterator = 1000
)
#>    chrom start   end dense_track vtrack1 vtrack2 intervalID
#> 1   chr1     0  1000  0.08988887    0.20    0.06          1
#> 2   chr1  1000  2000  0.04100000    0.26    0.02          1
#> 3   chr1  2000  3000  0.04100000    0.10    0.04          1
#> 4   chr1  3000  4000  0.04100000    0.14    0.04          1
#> 5   chr1  4000  5000  0.04300000    0.18    0.04          1
#> 6   chr1  5000  6000  0.04000000    0.10    0.04          1
#> 7   chr1  6000  7000  0.05799999    0.22    0.04          1
#> 8   chr1  7000  8000  0.02400000    0.06    0.02          1
#> 9   chr1  8000  9000  0.03300000    0.10    0.03          1
#> 10  chr1  9000 10000  0.06899999    0.18    0.07          1

gvtrack.create("vtrack3", "dense_track", "global.percentile")
gvtrack.create("vtrack4", "annotations", "distance")
gdist(
    "vtrack3", seq(0, 1, l = 10), "vtrack4",
    seq(-500, 500, 200)
)
#>                     (-500,-300] (-300,-100] (-100,100] (100,300] (300,500]
#> (0,0.111111]                  2           5         77         7         8
#> (0.111111,0.222222]           0           0          0         0         0
#> (0.222222,0.333333]           5           7         36         9         7
#> (0.333333,0.444444]           5           2         32         3         1
#> (0.444444,0.555556]           2           2         24         2         2
#> (0.555556,0.666667]           4           1         29         6         2
#> (0.666667,0.777778]           0           2         29         2         4
#> (0.777778,0.888889]           0           1         35         0         1
#> (0.888889,1]                  2           3         18         2         3
#> attr(,"breaks")
#> attr(,"breaks")[[1]]
#>  [1] 0.0000000 0.1111111 0.2222222 0.3333333 0.4444444 0.5555556 0.6666667
#>  [8] 0.7777778 0.8888889 1.0000000
#> 
#> attr(,"breaks")[[2]]
#> [1] -500 -300 -100  100  300  500
#> 

gvtrack.create("cov", "annotations", "coverage")
gextract("cov", gintervals(1, 0, 1000), iterator = 100)
#>    chrom start  end cov intervalID
#> 1   chr1     0  100 0.8          1
#> 2   chr1   100  200 1.0          1
#> 3   chr1   200  300 1.0          1
#> 4   chr1   300  400 1.0          1
#> 5   chr1   400  500 1.0          1
#> 6   chr1   500  600 1.0          1
#> 7   chr1   600  700 1.0          1
#> 8   chr1   700  800 1.0          1
#> 9   chr1   800  900 1.0          1
#> 10  chr1   900 1000 1.0          1

pssm <- matrix(
    c(
        0.7, 0.1, 0.1, 0.1, # Example PSSM
        0.1, 0.7, 0.1, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.7, 0.1,
        0.1, 0.1, 0.7, 0.1
    ),
    ncol = 4, byrow = TRUE
)
colnames(pssm) <- c("A", "C", "G", "T")
gvtrack.create(
    "motif_score", NULL, "pwm",
    list(pssm = pssm, bidirect = TRUE, prior = 0.01)
)
gvtrack.create("max_motif_score", NULL, "pwm.max",
    pssm = pssm, bidirect = TRUE, prior = 0.01
)
gvtrack.create("max_motif_pos", NULL, "pwm.max.pos",
    pssm = pssm
)
gextract(
    c(
        "dense_track", "motif_score", "max_motif_score",
        "max_motif_pos"
    ),
    gintervals(1, 0, 10000),
    iterator = 500
)
#>    chrom start   end dense_track motif_score max_motif_score max_motif_pos
#> 1   chr1     0   500   0.1577778  -1.7766590       -5.326688          -147
#> 2   chr1   500  1000   0.0220000  -0.9113574       -4.131330           178
#> 3   chr1  1000  1500   0.0540000  -1.3073393       -4.131330            32
#> 4   chr1  1500  2000   0.0280000  -1.5331398       -4.131330           203
#> 5   chr1  2000  2500   0.0420000  -1.4151764       -4.151338          -127
#> 6   chr1  2500  3000   0.0400000  -0.9677283       -2.289690          -135
#> 7   chr1  3000  3500   0.0320000  -1.3787552       -4.151338            22
#> 8   chr1  3500  4000   0.0500000  -1.2041827       -4.151338             4
#> 9   chr1  4000  4500   0.0360000  -1.0488071       -2.289690           -39
#> 10  chr1  4500  5000   0.0500000  -0.9755251       -4.131330           -55
#> 11  chr1  5000  5500   0.0460000  -1.4299518       -4.131330            11
#> 12  chr1  5500  6000   0.0340000  -1.2019978       -4.131330            62
#> 13  chr1  6000  6500   0.0800000  -1.6382426       -4.151338           152
#> 14  chr1  6500  7000   0.0360000  -1.1377980       -4.131330            20
#> 15  chr1  7000  7500   0.0220000  -1.0761293       -4.131330             8
#> 16  chr1  7500  8000   0.0260000  -1.0666249       -4.131330           174
#> 17  chr1  8000  8500   0.0320000  -0.9999036       -2.289690            75
#> 18  chr1  8500  9000   0.0340000  -0.9541358       -2.289690           189
#> 19  chr1  9000  9500   0.0580000  -0.8253071       -2.289690           450
#> 20  chr1  9500 10000   0.0800000  -1.5154103       -4.151338          -248
#>    intervalID
#> 1           1
#> 2           1
#> 3           1
#> 4           1
#> 5           1
#> 6           1
#> 7           1
#> 8           1
#> 9           1
#> 10          1
#> 11          1
#> 12          1
#> 13          1
#> 14          1
#> 15          1
#> 16          1
#> 17          1
#> 18          1
#> 19          1
#> 20          1

# Kmer counting examples
gvtrack.create("cg_count", NULL, "kmer.count", kmer = "CG", strand = 1)
gvtrack.create("cg_frac", NULL, "kmer.frac", kmer = "CG", strand = 1)
gextract(c("cg_count", "cg_frac"), gintervals(1, 0, 10000), iterator = 1000)
#>    chrom start   end cg_count    cg_frac intervalID
#> 1   chr1     0  1000       73 0.07307307          1
#> 2   chr1  1000  2000       27 0.02702703          1
#> 3   chr1  2000  3000       23 0.02302302          1
#> 4   chr1  3000  4000       15 0.01501502          1
#> 5   chr1  4000  5000       27 0.02702703          1
#> 6   chr1  5000  6000       25 0.02502503          1
#> 7   chr1  6000  7000       18 0.01801802          1
#> 8   chr1  7000  8000       25 0.02502503          1
#> 9   chr1  8000  9000       20 0.02002002          1
#> 10  chr1  9000 10000       24 0.02402402          1

gvtrack.create("at_pos", NULL, "kmer.count", kmer = "AT", strand = 1)
gvtrack.create("at_neg", NULL, "kmer.count", kmer = "AT", strand = -1)
gvtrack.create("at_both", NULL, "kmer.count", kmer = "AT", strand = 0)
#> Warning: kmer sequence 'AT' is palindromic, please set strand to 1 or -1 to avoid double counting
gextract(c("at_pos", "at_neg", "at_both"), gintervals(1, 0, 10000), iterator = 1000)
#>    chrom start   end at_pos at_neg at_both intervalID
#> 1   chr1     0  1000      4      4       8          1
#> 2   chr1  1000  2000     47     47      94          1
#> 3   chr1  2000  3000     28     28      56          1
#> 4   chr1  3000  4000     33     33      66          1
#> 5   chr1  4000  5000     24     24      48          1
#> 6   chr1  5000  6000     19     19      38          1
#> 7   chr1  6000  7000     34     34      68          1
#> 8   chr1  7000  8000     31     31      62          1
#> 9   chr1  8000  9000     40     40      80          1
#> 10  chr1  9000 10000     32     32      64          1

# GC content
gvtrack.create("g_frac", NULL, "kmer.frac", kmer = "G")
gvtrack.create("c_frac", NULL, "kmer.frac", kmer = "C")
gextract("g_frac + c_frac", gintervals(1, 0, 10000),
    iterator = 1000,
    colnames = "gc_content"
)
#>    chrom start   end gc_content intervalID
#> 1   chr1     0  1000      0.629          1
#> 2   chr1  1000  2000      0.522          1
#> 3   chr1  2000  3000      0.603          1
#> 4   chr1  3000  4000      0.570          1
#> 5   chr1  4000  5000      0.591          1
#> 6   chr1  5000  6000      0.622          1
#> 7   chr1  6000  7000      0.557          1
#> 8   chr1  7000  8000      0.626          1
#> 9   chr1  8000  9000      0.587          1
#> 10  chr1  9000 10000      0.565          1