Finds neighbors between two sets of intervals.
gintervals.neighbors(
intervals1 = NULL,
intervals2 = NULL,
maxneighbors = 1,
mindist = -1e+09,
maxdist = 1e+09,
mindist1 = -1e+09,
maxdist1 = 1e+09,
mindist2 = -1e+09,
maxdist2 = 1e+09,
na.if.notfound = FALSE,
intervals.set.out = NULL
)
intervals
maximal number of neighbors
distance range for 1D intervals
distance range for 2D intervals
if 'TRUE' return 'NA' interval if no matching neighbors were found, otherwise omit the interval in the answer
intervals set name where the function result is optionally outputted
If 'intervals.set.out' is 'NULL' a data frame containing the pairs of intervals from 'intervals1', intervals from 'intervals2' and an additional column named 'dist' ('dist1' and 'dist2' for 2D intervals) representing the distance between the corresponding intervals. The intervals from intervals2 would be changed to 'chrom1', 'start1', and 'end1' and for 2D intervals chrom11, start11, end11 and chrom22, start22, end22. If 'na.if.notfound' is 'TRUE', the data frame contains all the intervals from 'intervals1' including those for which no matching neighbor was found. For the latter intervals an 'NA' neighboring interval is stated. If 'na.if.notfound' is 'FALSE', the data frame contains only intervals from 'intervals1' for which matching neighbor(s) was found.
This function finds for each interval in 'intervals1' the closest 'maxneighbors' intervals from 'intervals2'.
For 1D intervals the distance must fall in the range of ['mindist', 'maxdist']. If 'intervals2' contains a 'strand' column the distance can be positive or negative depending on the 'strand' value and the position of interval2 relatively to interval1. If 'strand' column is missing the distance is always positive.
For 2D intervals two distances are calculated and returned for each axis. The distances must fall in the range of ['mindist1', 'maxdist1'] for axis 1 and ['mindist2', 'maxdist2'] for axis 2. For selecting the closest 'maxneighbors' intervals Manhattan distance is used (i.e. dist1+dist2).
The names of the returned columns are made unique using
make.unique(colnames(df), sep = "")
, assuming 'df' is the result.
If 'intervals.set.out' is not 'NULL' the result is saved as an intervals set. Use this parameter if the result size exceeds the limits of the physical memory.
gdb.init_examples()
intervs1 <- giterator.intervals("dense_track",
gintervals(1, 0, 4000),
iterator = 233
)
intervs2 <- giterator.intervals(
"sparse_track",
gintervals(1, 0, 2000)
)
gintervals.neighbors(intervs1, intervs2, 10,
mindist = -300,
maxdist = 500
)
#> chrom start end chrom1 start1 end1 dist
#> 1 chr1 0 233 chr1 0 50 0
#> 2 chr1 0 233 chr1 100 150 0
#> 3 chr1 0 233 chr1 250 300 17
#> 4 chr1 233 466 chr1 250 300 0
#> 5 chr1 233 466 chr1 100 150 83
#> 6 chr1 233 466 chr1 0 50 183
#> 7 chr1 466 699 chr1 250 300 166
#> 8 chr1 466 699 chr1 100 150 316
#> 9 chr1 466 699 chr1 0 50 416
#> 10 chr1 699 932 chr1 250 300 399
#> 11 chr1 699 932 chr1 1400 1450 468
#> 12 chr1 932 1165 chr1 1400 1450 235
#> 13 chr1 1165 1398 chr1 1400 1450 2
#> 14 chr1 1398 1631 chr1 1400 1450 0
#> 15 chr1 1631 1864 chr1 1400 1450 181
#> 16 chr1 1864 2097 chr1 1400 1450 414
intervs2$strand <- c(1, 1, -1, 1)
gintervals.neighbors(intervs1, intervs2, 10,
mindist = -300,
maxdist = 500
)
#> chrom start end chrom1 start1 end1 strand dist
#> 1 chr1 0 233 chr1 0 50 1 0
#> 2 chr1 0 233 chr1 100 150 1 0
#> 3 chr1 0 233 chr1 250 300 -1 17
#> 4 chr1 233 466 chr1 250 300 -1 0
#> 5 chr1 233 466 chr1 100 150 1 83
#> 6 chr1 233 466 chr1 0 50 1 183
#> 7 chr1 466 699 chr1 250 300 -1 -166
#> 8 chr1 466 699 chr1 100 150 1 316
#> 9 chr1 466 699 chr1 0 50 1 416
#> 10 chr1 932 1165 chr1 1400 1450 1 -235
#> 11 chr1 1165 1398 chr1 1400 1450 1 -2
#> 12 chr1 1398 1631 chr1 1400 1450 1 0
#> 13 chr1 1631 1864 chr1 1400 1450 1 181
#> 14 chr1 1864 2097 chr1 1400 1450 1 414