Intervals¶

Functions for creating, manipulating, and querying genomic intervals, including set operations, annotation, normalization, and I/O.

pymisha.gintervals ¶

gintervals(chroms, starts=0, ends=-1, strand=None)

Create a 1D intervals DataFrame.

Constructs an intervals DataFrame from parallel arrays of chromosome names, start coordinates, and end coordinates. Scalar arguments are broadcast to match the longest array.

PARAMETER	DESCRIPTION
`chroms`	Chromosome names. Can be strings like `"chr1"` or integers like `1`. TYPE: `str, int, or list`
`starts`	Start coordinates (0-based, inclusive). TYPE: `int or list of int` DEFAULT: `0`
`ends`	End coordinates (0-based, exclusive). `-1` means full chromosome length. TYPE: `int or list of int` DEFAULT: `-1`
`strand`	Strand information (`-1`, `0`, or `1`). Note: this interval convention differs from liftover chain tables, where strand fields are encoded as `0` (`+`) or `1` (`-`). TYPE: `int or list of int` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	Sorted intervals with columns: chrom, start, end (and optionally strand).

See Also

gintervals_all : Return full-chromosome intervals for every chromosome. gintervals_2d : Create 2D intervals. gintervals_from_tuples : Create intervals from a list of tuples. gintervals_from_strings : Create intervals from region strings. gintervals_from_bed : Create intervals from a BED file.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

The following calls produce equivalent results:

>>> pm.gintervals(1)
>>> pm.gintervals("1")
>>> pm.gintervals("chrX")

Specify start coordinates:

>>> pm.gintervals(1, 1000)

Multiple intervals with broadcast:

>>> pm.gintervals(["chr2", "chrX"], 10, [3000, 5000])

pymisha.gintervals_all ¶

gintervals_all()

Return all chromosome intervals (ALLGENOME).

Returns a DataFrame with one row per chromosome, covering the full extent of each chromosome in the current genome database as defined by chrom_sizes.txt.

RETURNS	DESCRIPTION
`DataFrame`	Intervals with columns: chrom, start, end.

See Also

gintervals : Create a custom set of 1D intervals. gintervals_2d_all : Return 2D intervals covering the whole genome. gintervals_from_tuples : Create intervals from a list of tuples.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_all()

pymisha.gintervals_2d ¶

gintervals_2d(chroms1, starts1=0, ends1=-1, chroms2=None, starts2=0, ends2=-1)

Create a set of 2D genomic intervals.

PARAMETER	DESCRIPTION
`chroms1`	Chromosome name(s) for first dimension. TYPE: `str, int, or list`
`starts1`	Start coordinate(s) for first dimension. TYPE: `int or list` DEFAULT: `0`
`ends1`	End coordinate(s) for first dimension. -1 means full chromosome length. TYPE: `int or list` DEFAULT: `-1`
`chroms2`	Chromosome name(s) for second dimension. Defaults to chroms1. TYPE: `str, int, list, or None` DEFAULT: `None`
`starts2`	Start coordinate(s) for second dimension. TYPE: `int or list` DEFAULT: `0`
`ends2`	End coordinate(s) for second dimension. -1 means full chromosome length. TYPE: `int or list` DEFAULT: `-1`

RETURNS	DESCRIPTION
`DataFrame`	Sorted 2D intervals with columns: chrom1, start1, end1, chrom2, start2, end2.

See Also

gintervals : Create 1D intervals. gintervals_2d_all : Return 2D intervals covering the whole genome. gintervals_2d_band_intersect : Intersect 2D intervals with a diagonal band. gintervals_force_range : Clamp intervals to chromosome boundaries.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

The following calls produce equivalent results:

>>> pm.gintervals_2d(1)
>>> pm.gintervals_2d("1")
>>> pm.gintervals_2d("chrX")

Explicit coordinates on both dimensions:

>>> pm.gintervals_2d(1, 1000, 2000, "chrX", 400, 800)

Multiple intervals with broadcast:

>>> pm.gintervals_2d(["chr2", "chrX"], 10, [3000, 5000], 1)

pymisha.gintervals_2d_all ¶

gintervals_2d_all(mode='diagonal')

Return 2D intervals covering the whole genome.

PARAMETER	DESCRIPTION
`mode`	"diagonal" returns only intra-chromosomal pairs (chrom1 == chrom2). "full" returns all NxN chromosome pairs. TYPE: `str` DEFAULT: `"diagonal"`

RETURNS	DESCRIPTION
`DataFrame`	2D intervals with columns: chrom1, start1, end1, chrom2, start2, end2.

See Also

gintervals_2d : Create a custom set of 2D intervals. gintervals_all : Return 1D intervals covering the whole genome. gintervals_2d_band_intersect : Intersect 2D intervals with a diagonal band.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

Diagonal mode (intra-chromosomal pairs only):

>>> pm.gintervals_2d_all()

Full NxN chromosome pairs:

>>> pm.gintervals_2d_all(mode="full")

pymisha.gintervals_2d_band_intersect ¶

gintervals_2d_band_intersect(intervals, band, intervals_set_out=None)

Intersect 2D intervals with a diagonal band.

Each 2D interval is intersected with the band defined by two distances d1 and d2 from the main diagonal (where x == y). The band captures the region where d1 <= (start1 - start2) < d2. If the intersection is non-empty, the interval is shrunk to the minimal bounding rectangle of the intersection.

Only cis (same-chromosome) intervals can intersect a band; trans intervals are removed.

PARAMETER	DESCRIPTION
`intervals`	2D intervals with columns chrom1, start1, end1, chrom2, start2, end2. TYPE: `DataFrame`
`band`	Pair (d1, d2) defining the diagonal band. d1 must be < d2. TYPE: `tuple of (int, int)`
`intervals_set_out`	If provided, save result as intervals set and return None. TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame or None`	Intersected 2D intervals, or None if intervals_set_out is specified.

See Also

gintervals_2d : Create 2D intervals. gintervals_2d_all : Return 2D intervals covering the whole genome. gintervals_intersect : Intersect two 1D interval sets.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs = pm.gintervals_2d(1)
>>> pm.gintervals_2d_band_intersect(intervs, (10000, 20000))

pymisha.gintervals_union ¶

gintervals_union(intervals1, intervals2, intervals_set_out=None)

Calculate the union of two sets of intervals.

Returns intervals representing the genomic space covered by either intervals1 or intervals2. Overlapping and adjacent regions are merged in the result.

PARAMETER	DESCRIPTION
`intervals1`	First set of 1D intervals (chrom, start, end). TYPE: `DataFrame`
`intervals2`	Second set of 1D intervals (chrom, start, end). TYPE: `DataFrame`

RETURNS	DESCRIPTION
`DataFrame or None`	Union intervals sorted by chrom and start, or `None` if both inputs are empty.

See Also

gintervals_intersect : Intersection of two interval sets. gintervals_diff : Difference of two interval sets. gintervals_canonic : Merge overlapping intervals within one set.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs1 = pm.gintervals("1", [0, 500], [300, 800])
>>> intervs2 = pm.gintervals("1", [200, 700], [400, 900])
>>> pm.gintervals_union(intervs1, intervs2)

pymisha.gintervals_intersect ¶

gintervals_intersect(intervals1, intervals2, intervals_set_out=None)

Calculate the intersection of two sets of intervals.

Returns intervals representing the genomic space covered by both intervals1 and intervals2.

PARAMETER	DESCRIPTION
`intervals1`	First set of 1D intervals (chrom, start, end). TYPE: `DataFrame`
`intervals2`	Second set of 1D intervals (chrom, start, end). TYPE: `DataFrame`

RETURNS	DESCRIPTION
`DataFrame or None`	Intersection intervals sorted by chrom and start, or `None` if the intersection is empty.

See Also

gintervals_union : Union of two interval sets. gintervals_diff : Difference of two interval sets. gintervals_2d_band_intersect : Intersect 2D intervals with a diagonal band.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs1 = pm.gintervals("1", 0, 500)
>>> intervs2 = pm.gintervals("1", 300, 800)
>>> pm.gintervals_intersect(intervs1, intervs2)

pymisha.gintervals_diff ¶

gintervals_diff(intervals1, intervals2, intervals_set_out=None)

Calculate the difference of two interval sets.

Returns genomic space covered by intervals1 but not by intervals2.

PARAMETER	DESCRIPTION
`intervals1`	First set of 1D intervals (chrom, start, end). TYPE: `DataFrame`
`intervals2`	Second set of 1D intervals (chrom, start, end). TYPE: `DataFrame`

RETURNS	DESCRIPTION
`DataFrame or None`	Difference intervals sorted by chrom and start, or `None` if the result is empty.

See Also

gintervals_union : Union of two interval sets. gintervals_intersect : Intersection of two interval sets.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs1 = pm.gintervals("1", 0, 500)
>>> intervs2 = pm.gintervals("1", 200, 300)
>>> pm.gintervals_diff(intervs1, intervs2)

pymisha.gintervals_canonic ¶

gintervals_canonic(intervals, unify_touching_intervals=True)

Convert intervals to canonical form.

Sorts intervals and merges overlapping ones. If unify_touching_intervals is True, adjacent intervals (where one's end equals another's start) are also merged. The result has no overlaps and is properly sorted.

A mapping attribute is attached to the result DataFrame mapping each original interval index to the canonical interval index: result.attrs['mapping'].

PARAMETER	DESCRIPTION
`intervals`	Intervals to canonicalize (chrom, start, end). TYPE: `DataFrame`
`unify_touching_intervals`	Whether to merge touching (end == start) intervals. TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`DataFrame or None`	Canonical intervals with `mapping` attribute, or `None` if input is empty.

See Also

gintervals_union : Union of two interval sets (implicitly canonicalizes). gintervals_intersect : Intersection of two interval sets.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs = pm.gintervals("1", [0, 200, 100], [150, 300, 250])
>>> result = pm.gintervals_canonic(intervs)
>>> result
>>> result.attrs['mapping']

pymisha.gintervals_force_range ¶

gintervals_force_range(intervals, intervals_set_out=None)

Force intervals into valid chromosome ranges.

Enforces intervals to lie within [0, chrom_length) by clamping their boundaries. Intervals that fall entirely outside chromosome ranges are removed.

PARAMETER	DESCRIPTION
`intervals`	1D intervals with columns: chrom, start, end. TYPE: `DataFrame`

RETURNS	DESCRIPTION
`DataFrame or None`	Clamped intervals, or `None` if all intervals are out of range or the input is empty.

RAISES	DESCRIPTION
`ValueError`	If intervals is `None`.

See Also

gintervals : Create a set of 1D intervals. gintervals_2d : Create a set of 2D intervals. gintervals_canonic : Merge overlapping intervals.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> import pandas as pd
>>> intervs = pd.DataFrame({
...     "chrom": ["1", "1", "1", "1"],
...     "start": [11000, -100, 10000, 10500],
...     "end":   [12000, 200, 1300000, 10600],
... })
>>> pm.gintervals_force_range(intervs)

pymisha.gintervals_covered_bp ¶

gintervals_covered_bp(intervals, src=None)

Compute total basepairs covered by intervals.

Overlapping intervals are merged before counting to avoid double-counting. When src is provided, only the portion of intervals that overlaps src is counted.

PARAMETER	DESCRIPTION
`intervals`	Interval set with columns: chrom, start, end. A string is interpreted as a saved interval-set name. TYPE: `DataFrame or str`
`src`	If provided, restrict counting to the intersection of intervals with src. TYPE: `DataFrame, str, or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`int`	Total number of basepairs covered

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs = pm.gintervals("1", [0, 200], [300, 600])
>>> pm.gintervals_covered_bp(intervs)  # 0-300 + 200-600 = 600 (overlaps merged)
600

See Also

gintervals_coverage_fraction : Fraction of genomic space covered. gintervals_canonic : Merge overlapping intervals. gintervals : Create a set of 1D intervals.

pymisha.gintervals_coverage_fraction ¶

gintervals_coverage_fraction(intervals1, intervals2=None)

Calculate the fraction of genomic space covered by intervals.

Returns the fraction of intervals2 (or the entire genome when intervals2 is None) that is covered by intervals1. Overlapping intervals in either set are unified before calculation.

PARAMETER	DESCRIPTION
`intervals1`	The covering set of 1D intervals (chrom, start, end). TYPE: `DataFrame`
`intervals2`	The reference space to measure against. `None` means the entire genome. TYPE: `DataFrame or None` DEFAULT: `None`

RETURNS	DESCRIPTION
`float`	A value between 0.0 and 1.0 representing the fraction of intervals2 (or the genome) covered by intervals1.

See Also

gintervals_covered_bp : Total base pairs covered by intervals. gintervals_intersect : Intersection of two interval sets. gintervals_all : Return full-genome intervals.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs1 = pm.gintervals("1", 0, 100000)
>>> intervs2 = pm.gintervals(["1", "2"], 0, [100000, 100000])
>>> pm.gintervals_coverage_fraction(intervs1, intervs2)
>>> pm.gintervals_coverage_fraction(intervs1)

pymisha.gintervals_mark_overlaps ¶

gintervals_mark_overlaps(intervals, group_col='overlap_group', unify_touching_intervals=True)

Mark groups of overlapping intervals with a shared group ID.

Each interval in the input is assigned an integer group identifier. Intervals that overlap (or touch, when unify_touching_intervals is True) share the same group ID.

PARAMETER	DESCRIPTION
`intervals`	1D intervals with columns `chrom`, `start`, `end` and any additional data columns. TYPE: `DataFrame`
`group_col`	Name of the column to store group IDs. TYPE: `str` DEFAULT: ``"overlap_group"``
`unify_touching_intervals`	Whether touching intervals (`end == start`) are considered overlapping. TYPE: `bool` DEFAULT: `True`

RETURNS	DESCRIPTION
`DataFrame`	The original intervals with an added group_col column.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> import pandas as pd
>>> intervs = pd.DataFrame({
...     "chrom": ["1", "1", "1", "1"],
...     "start": [11000, 100, 10000, 10500],
...     "end":   [12000, 200, 13000, 10600],
...     "data":  [10, 20, 30, 40],
... })
>>> pm.gintervals_mark_overlaps(intervs)

See Also

gintervals_canonic : Merge overlapping intervals. gintervals_intersect : Intersection of two interval sets. gintervals_annotate : Annotate intervals with nearest-neighbor columns.

pymisha.gintervals_annotate ¶

gintervals_annotate(intervals, annotation_intervals, annotation_columns=None, column_names=None, dist_column='dist', max_dist=float('inf'), na_value=_numpy.nan, maxneighbors=1, tie_method='first', overwrite=False, keep_order=True, **kwargs)

Annotate intervals with columns from the nearest annotation intervals.

For each interval in intervals, the nearest neighbor in annotation_intervals is found (via :func:gintervals_neighbors), and the specified annotation columns are copied over.

PARAMETER	DESCRIPTION
`intervals`	1D query intervals. TYPE: `DataFrame`
`annotation_intervals`	Source intervals containing annotation data. TYPE: `DataFrame`
`annotation_columns`	Columns to copy from annotation_intervals. `None` means all non-coordinate columns. TYPE: `list of str` DEFAULT: `None`
`column_names`	Output names for the annotation columns (must match length of annotation_columns). TYPE: `list of str` DEFAULT: `None`
`dist_column`	Name for the distance column. `None` to omit. TYPE: `str or None` DEFAULT: ``"dist"``
`max_dist`	Maximum absolute distance. Annotations farther away are replaced with na_value. TYPE: `float` DEFAULT: ``inf``
`na_value`	Fill value for annotations beyond max_dist or when no neighbor is found. Can be a dict mapping column names to individual fill values. TYPE: `scalar or dict` DEFAULT: ``NaN``
`maxneighbors`	Number of nearest neighbors to consider. TYPE: `int` DEFAULT: `1`
`tie_method`	Tie-breaking strategy when multiple neighbors are equidistant. Only applies when `maxneighbors > 1`. `"first"` -- arbitrary but stable order (default). `"min.start"` -- prefer the neighbor with the smaller start coordinate. `"min.end"` -- prefer the neighbor with the smaller end coordinate. TYPE: `str` DEFAULT: ``"first"``
`overwrite`	If `True`, allow annotation columns to overwrite existing columns in intervals. TYPE: `bool` DEFAULT: `False`
`keep_order`	Preserve original row order. TYPE: `bool` DEFAULT: `True`
`**kwargs`	Additional keyword arguments passed to :func:`gintervals_neighbors` (e.g. `mindist`, `maxdist`). DEFAULT: `{}`

RETURNS	DESCRIPTION
`DataFrame`	The input intervals with added annotation and distance columns.

RAISES	DESCRIPTION
`ValueError`	If annotation columns conflict with existing columns and overwrite is `False`.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs = pm.gintervals("1", [1000, 5000], [1100, 5050])
>>> ann = pm.gintervals("1", [900, 5400], [950, 5500])
>>> ann["remark"] = ["a", "b"]
>>> ann["score"] = [10.0, 20.0]
>>> pm.gintervals_annotate(intervs, ann)
>>> pm.gintervals_annotate(intervs, ann,
...     annotation_columns=["remark"],
...     column_names=["ann_remark"],
...     dist_column="ann_dist")
>>> pm.gintervals_annotate(intervs, ann,
...     annotation_columns=["remark"],
...     max_dist=200, na_value="no_ann")
>>> pm.gintervals_annotate(intervs, ann,
...     annotation_columns=["remark"],
...     maxneighbors=2,
...     tie_method="min.start")

See Also

gintervals_neighbors : Find nearest neighbors between interval sets. gintervals_mark_overlaps : Mark groups of overlapping intervals.

pymisha.gintervals_normalize ¶

gintervals_normalize(intervals, size, intervals_set_out=None)

Normalize intervals to a specified size by centering.

Each interval is resized to the target size while keeping its center position. Results are clamped to chromosome boundaries.

PARAMETER	DESCRIPTION
`intervals`	1D intervals with columns `chrom`, `start`, `end`. TYPE: `DataFrame`
`size`	Target interval size(s) in basepairs. Can be: A single positive integer: all intervals get this size. A vector matching the number of intervals: each interval gets its own target size. A vector with `len(intervals) == 1`: the single interval is replicated once per size (one-to-many expansion). TYPE: `int or array - like`

RETURNS	DESCRIPTION
`DataFrame`	Normalized intervals.

RAISES	DESCRIPTION
`ValueError`	If size contains non-positive values or if vector length does not match the number of intervals.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs = pm.gintervals("1", [1000, 5000], [2000, 6000])
>>> pm.gintervals_normalize(intervs, 500)
>>> pm.gintervals_normalize(intervs, [500, 1000])
>>> pm.gintervals_normalize(pm.gintervals("1", 1000, 2000), [500, 1000, 1500])

See Also

gintervals_force_range : Clamp intervals to chromosome boundaries. gintervals_window : Create intervals centered on positions.

pymisha.gintervals_neighbors ¶

gintervals_neighbors(intervals1, intervals2, maxneighbors=1, mindist=-1000000000.0, maxdist=1000000000.0, na_if_notfound=False, use_intervals1_strand=False)

Find nearest neighbors between two sets of intervals.

For each interval in intervals1, finds the closest intervals from intervals2. Distance directionality can be determined by either the strand of the target intervals (intervals2, default) or the query intervals (intervals1).

PARAMETER	DESCRIPTION
`intervals1`	Query intervals with columns 'chrom', 'start', 'end' (and optionally 'strand'). TYPE: `DataFrame`
`intervals2`	Target intervals to search for neighbors. TYPE: `DataFrame`
`maxneighbors`	Maximum number of neighbors to return per query interval. TYPE: `int` DEFAULT: `1`
`mindist`	Minimum distance (negative means target is upstream/left of query). TYPE: `float` DEFAULT: `-1e9`
`maxdist`	Maximum distance (positive means target is downstream/right of query). TYPE: `float` DEFAULT: `1e9`
`na_if_notfound`	If True, include queries with no neighbors (with NA values). TYPE: `bool` DEFAULT: `False`
`use_intervals1_strand`	If True, use intervals1 strand column for distance directionality instead of intervals2 strand. This is useful for TSS analysis where you want upstream/downstream distances relative to gene direction. When True: - + strand queries: negative distance = upstream, positive = downstream - - strand queries: negative distance = downstream, positive = upstream TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`DataFrame or None`	DataFrame with query and neighbor coordinates plus distance column.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> query = pm.gintervals("1", [5000], [5100])
>>> targets = pm.gintervals("1", [3000, 7000], [3100, 7100])
>>> pm.gintervals_neighbors(query, targets)

See Also

gintervals_neighbors_upstream : Find upstream neighbors only. gintervals_neighbors_downstream : Find downstream neighbors only. gintervals_neighbors_directional : Find both upstream and downstream. gintervals_annotate : Annotate intervals with nearest-neighbor columns.

pymisha.gintervals_neighbors_upstream ¶

gintervals_neighbors_upstream(intervals1, intervals2, maxneighbors=1, maxdist=1000000000.0, na_if_notfound=False)

Find upstream neighbors of query intervals using strand directionality.

Upstream neighbors are those located in the 5' direction relative to the query strand: left (negative distance) for + strand queries, right (positive distance) for - strand queries.

PARAMETER	DESCRIPTION
`intervals1`	Query intervals. If 'strand' column is present, it determines direction. Missing or strand=0 is treated as + strand. TYPE: `DataFrame`
`intervals2`	Target intervals to search for neighbors. TYPE: `DataFrame`
`maxneighbors`	Maximum number of upstream neighbors to return per query. TYPE: `int` DEFAULT: `1`
`maxdist`	Maximum distance to search for neighbors (in bp). TYPE: `float` DEFAULT: `1e9`
`na_if_notfound`	If True, include queries with no neighbors (with NA values). TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`DataFrame or None`	DataFrame with query and neighbor coordinates plus distance column. Distance values are always <= 0 (upstream direction).

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> query = pm.gintervals("1", [5000], [5100])
>>> query["strand"] = 1  # + strand
>>> targets = pm.gintervals("1", [3000, 7000], [3100, 7100])
>>> pm.gintervals_neighbors_upstream(query, targets)

See Also

gintervals_neighbors : General neighbor finding. gintervals_neighbors_downstream : Find downstream neighbors. gintervals_neighbors_directional : Find both upstream and downstream.

pymisha.gintervals_neighbors_downstream ¶

gintervals_neighbors_downstream(intervals1, intervals2, maxneighbors=1, maxdist=1000000000.0, na_if_notfound=False)

Find downstream neighbors of query intervals using strand directionality.

Downstream neighbors are those located in the 3' direction relative to the query strand: right (positive distance) for + strand queries, left (negative distance) for - strand queries.

PARAMETER	DESCRIPTION
`intervals1`	Query intervals. If 'strand' column is present, it determines direction. Missing or strand=0 is treated as + strand. TYPE: `DataFrame`
`intervals2`	Target intervals to search for neighbors. TYPE: `DataFrame`
`maxneighbors`	Maximum number of downstream neighbors to return per query. TYPE: `int` DEFAULT: `1`
`maxdist`	Maximum distance to search for neighbors (in bp). TYPE: `float` DEFAULT: `1e9`
`na_if_notfound`	If True, include queries with no neighbors (with NA values). TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`DataFrame or None`	DataFrame with query and neighbor coordinates plus distance column. Distance values are always >= 0 (downstream direction).

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> query = pm.gintervals("1", [5000], [5100])
>>> query["strand"] = 1  # + strand
>>> targets = pm.gintervals("1", [3000, 7000], [3100, 7100])
>>> pm.gintervals_neighbors_downstream(query, targets)

See Also

gintervals_neighbors : General neighbor finding. gintervals_neighbors_upstream : Find upstream neighbors. gintervals_neighbors_directional : Find both upstream and downstream.

pymisha.gintervals_neighbors_directional ¶

gintervals_neighbors_directional(intervals1, intervals2, maxneighbors_upstream=1, maxneighbors_downstream=1, maxdist=1000000000.0, na_if_notfound=False)

Find both upstream and downstream neighbors of query intervals.

Convenience function that returns both upstream and downstream neighbors in a single call.

PARAMETER	DESCRIPTION
`intervals1`	Query intervals. If 'strand' column is present, it determines direction. Missing or strand=0 is treated as + strand. TYPE: `DataFrame`
`intervals2`	Target intervals to search for neighbors. TYPE: `DataFrame`
`maxneighbors_upstream`	Maximum number of upstream neighbors to return per query. TYPE: `int` DEFAULT: `1`
`maxneighbors_downstream`	Maximum number of downstream neighbors to return per query. TYPE: `int` DEFAULT: `1`
`maxdist`	Maximum distance to search for neighbors (in bp). TYPE: `float` DEFAULT: `1e9`
`na_if_notfound`	If True, include queries with no neighbors (with NA values). TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`dict`	Dictionary with keys 'upstream' and 'downstream', each containing a DataFrame (or None) with neighbor results.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> query = pm.gintervals("1", [5000], [5100])
>>> query["strand"] = 1
>>> targets = pm.gintervals("1", [3000, 7000], [3100, 7100])
>>> result = pm.gintervals_neighbors_directional(query, targets)
>>> result["upstream"]
>>> result["downstream"]

See Also

gintervals_neighbors : General neighbor finding. gintervals_neighbors_upstream : Find upstream neighbors only. gintervals_neighbors_downstream : Find downstream neighbors only.

pymisha.gintervals_random ¶

gintervals_random(size, n, dist_from_edge=3000000, chromosomes=None, mask=None, **kwargs)

Generate random genomic intervals.

Intervals are sampled uniformly from the genome (after excluding chromosome edges and optional filter regions). Each interval is exactly size basepairs.

PARAMETER	DESCRIPTION
`size`	Interval size in basepairs (must be positive). TYPE: `int`
`n`	Number of intervals to generate (must be positive). TYPE: `int`
`dist_from_edge`	Minimum distance from chromosome boundaries. TYPE: `float` DEFAULT: `3_000_000`
`chromosomes`	Restrict sampling to these chromosomes. TYPE: `list of str` DEFAULT: `None`
`mask`	Intervals to exclude from sampling (columns `chrom`, `start`, `end`). TYPE: `DataFrame` DEFAULT: `None`
`filter`	Backward-compatible alias for `mask`. TYPE: `DataFrame`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame with columns `chrom`, `start`, `end`.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_random(100, 1000)
>>> pm.gintervals_random(100, 1000, chromosomes=["1"])
>>> import numpy as np; np.random.seed(42)
>>> pm.gintervals_random(100, 50)

See Also

gintervals : Create intervals manually. gintervals_all : Return full-genome intervals.

pymisha.gintervals_from_tuples ¶

gintervals_from_tuples(rows, strand=None)

Create intervals from a list of tuples or dicts.

Each tuple should be (chrom, start, end) or (chrom, start, end, strand). Alternatively, each element can be a dict with the corresponding keys.

PARAMETER	DESCRIPTION
`rows`	Interval specifications. Tuples must have 3 or 4 elements. TYPE: `list of tuple or list of dict`
`strand`	Strand values to assign when the tuples do not include strand. TYPE: `int or list of int` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame or None`	Sorted intervals with columns: chrom, start, end (and optionally strand). Returns `None` if rows is `None`.

See Also

gintervals : Create intervals from parallel arrays. gintervals_from_strings : Create intervals from region strings. gintervals_from_bed : Create intervals from a BED file. gintervals_all : Return full-chromosome intervals.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_from_tuples([("1", 100, 200), ("1", 250, 300)])

pymisha.gintervals_from_strings ¶

gintervals_from_strings(regions)

Create intervals from region strings.

Parses strings of the form "chr1:100-200" or "chr1:100-200:+" into an intervals DataFrame. If only a chromosome name is given (e.g. "chr1"), the full chromosome extent is used.

PARAMETER	DESCRIPTION
`regions`	One or more region strings. Accepted formats: `"chrom"` -- full chromosome `"chrom:start-end"` -- region without strand `"chrom:start-end:+"` or `"chrom:start-end:-"` -- with strand TYPE: `str or list of str`

RETURNS	DESCRIPTION
`DataFrame`	Sorted intervals with columns: chrom, start, end (and optionally strand).

RAISES	DESCRIPTION
`ValueError`	If a region string cannot be parsed.

See Also

gintervals : Create intervals from parallel arrays. gintervals_from_tuples : Create intervals from a list of tuples. gintervals_from_bed : Create intervals from a BED file.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_from_strings(["1:100-200", "1:300-400:-"])

pymisha.gintervals_from_bed ¶

gintervals_from_bed(path, has_strand=False)

Create intervals from a BED-like file.

Reads a tab- or space-delimited file with at least three columns (chrom, start, end) and returns a sorted intervals DataFrame.

PARAMETER	DESCRIPTION
`path`	Path to BED file (chrom, start, end[, ...]). TYPE: `str or Path`
`has_strand`	If True, use column 6 for strand when present. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`DataFrame or None`	Sorted intervals with columns: chrom, start, end (and optionally strand). Returns `None` if the file contains no intervals.

RAISES	DESCRIPTION
`FileNotFoundError`	If path does not exist.

See Also

gintervals : Create intervals from parallel arrays. gintervals_from_tuples : Create intervals from a list of tuples. gintervals_from_strings : Create intervals from region strings.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_from_bed("example.bed")

pymisha.gintervals_import_genes ¶

gintervals_import_genes(genes_file, annots_file=None, annots_names=None)

Import gene annotations from a UCSC knownGene-format file.

Reads gene definitions from genes_file and produces four sets of intervals: TSS, exons, 3'UTR, and 5'UTR. A strand column is included (1 for "+", -1 for "-").

If annots_file is provided, annotations are attached to the intervals. annots_names must be supplied when annots_file is given.

Both genes_file and annots_file may be local file paths or URLs (http, https, ftp). Gzipped files (.gz) are handled automatically.

Overlapping intervals within each set are unified (merged). When two overlapping intervals have different strands, the merged strand is set to 0. Annotations from overlapping intervals are concatenated with semicolons; duplicate annotation values are removed.

PARAMETER	DESCRIPTION
`genes_file`	Path or URL to a knownGene-format file (12 tab-separated columns). TYPE: `str`
`annots_file`	Path or URL to an annotation file. The first column is the gene ID (matching `genes_file`), followed by annotation columns. TYPE: `str` DEFAULT: `None`
`annots_names`	Names for the annotation columns. Required when `annots_file` is given. The length must match the number of columns in the annotation file. TYPE: `list of str` DEFAULT: `None`

RETURNS	DESCRIPTION
`dict`	Dictionary with keys `"tss"`, `"exons"`, `"utr3"`, `"utr5"`. Each value is a :class:`~pandas.DataFrame` with columns `chrom`, `start`, `end`, `strand` (and any annotation columns), or `None` if the corresponding set is empty.

RAISES	DESCRIPTION
`ValueError`	If `genes_file` is None, or `annots_file` is given without `annots_names`, or file parsing fails.

See Also

gintervals : Create a custom set of 1D intervals. gintervals_save : Save intervals to the database.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> result = pm.gintervals_import_genes("genes.txt")
>>> sorted(result.keys())
['exons', 'tss', 'utr3', 'utr5']

pymisha.gintervals_window ¶

gintervals_window(chroms, centers, half_width)

Create intervals centered on positions with fixed half-width.

Constructs intervals of width 2 * half_width centered on each position in centers.

PARAMETER	DESCRIPTION
`chroms`	Chromosome name(s). Scalar is broadcast to match centers. TYPE: `str, int, or list`
`centers`	Center positions. Scalar is broadcast to match chroms. TYPE: `int or list of int`
`half_width`	Half the desired interval width. TYPE: `int`

RETURNS	DESCRIPTION
`DataFrame`	Sorted intervals with columns: chrom, start, end.

See Also

gintervals : Create intervals from explicit start/end coordinates. gintervals_normalize : Resize intervals by centering.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_window("1", [1000, 2000], half_width=50)

pymisha.gintervals_ls ¶

gintervals_ls(pattern='', ignore_case=False)

List named interval sets in the database.

PARAMETER	DESCRIPTION
`pattern`	Regular expression pattern to filter interval set names. Empty string matches all sets. TYPE: `str` DEFAULT: `""`
`ignore_case`	If True, pattern matching is case-insensitive. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`list of str`	Names of interval sets matching the pattern.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_ls()
>>> pm.gintervals_ls("annot.*")

See Also

gintervals_exists : Check if a named interval set exists. gintervals_load : Load a named interval set. gintervals_save : Save intervals as a named set. gintervals_rm : Remove a named interval set.

pymisha.gintervals_exists ¶

gintervals_exists(name)

Check if a named interval set exists.

PARAMETER	DESCRIPTION
`name`	Name of the interval set to check. TYPE: `str`

RETURNS	DESCRIPTION
`bool`	True if the interval set exists, False otherwise.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_exists("annotations")
True

See Also

gintervals_ls : List named interval sets. gintervals_load : Load a named interval set. gintervals_save : Save intervals as a named set. gintervals_rm : Remove a named interval set.

pymisha.gintervals_dataset ¶

gintervals_dataset(intervals=None)

Return the database/dataset root path for a named interval set.

Searches the user root, genome root, and all linked datasets for the given interval set name.

PARAMETER	DESCRIPTION
`intervals`	Name of the interval set (e.g. `"annotations"`). TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`str or None`	The root path of the database/dataset containing the interval set, or `None` if the set is not found.

RAISES	DESCRIPTION
`ValueError`	If intervals is `None`.

See Also

gintervals_exists : Check if a named interval set exists. gintervals_ls : List named interval sets. gintervals_load : Load a named interval set.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_dataset("annotations")
'...trackdb/test'

pymisha.gintervals_chrom_sizes ¶

gintervals_chrom_sizes(intervals)

Get chromosome sizes for intervals.

PARAMETER	DESCRIPTION
`intervals`	Intervals with 'chrom' column. TYPE: `DataFrame`

RETURNS	DESCRIPTION
`DataFrame`	DataFrame with 'chrom' column containing unique chromosomes present in the input intervals.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervals = pm.gintervals(["1", "2"], [0, 0], [10000, 20000])
>>> pm.gintervals_chrom_sizes(intervals)

See Also

gintervals_load : Load a named interval set. gintervals_exists : Check if a named interval set exists. gintervals_ls : List named interval sets.

pymisha.gintervals_load ¶

gintervals_load(intervals_set, chrom=None, chrom1=None, chrom2=None, progress=False)

Load a named interval set from the database.

PARAMETER	DESCRIPTION
`intervals_set`	Name of the interval set to load (e.g., "annotations", "genes.coding"). TYPE: `str`
`chrom`	If specified, only load intervals from this chromosome. TYPE: `str` DEFAULT: `None`
`chrom1`	If specified, load only intervals for this chromosome (2D only). TYPE: `str` DEFAULT: `None`
`chrom2`	If specified, load only intervals for this chromosome (2D only). TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame or None`	DataFrame with columns 'chrom', 'start', 'end' plus any additional columns stored in the interval set. Returns None if no intervals match.

RAISES	DESCRIPTION
`ValueError`	If the interval set does not exist.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervals = pm.gintervals_load("annotations")
>>> intervals = pm.gintervals_load("annotations", chrom="1")

See Also

gintervals_save : Save intervals as a named set. gintervals_update : Update a chromosome in an existing set. gintervals_exists : Check if a named interval set exists. gintervals_ls : List named interval sets. gintervals_rm : Remove a named interval set.

pymisha.gintervals_save ¶

gintervals_save(intervals, intervals_set)

Save intervals to the database as a named interval set.

PARAMETER	DESCRIPTION
`intervals`	Intervals to save. Must have either 'chrom', 'start', 'end' columns (1D) or 'chrom1', 'start1', 'end1', 'chrom2', 'start2', 'end2' columns (2D). TYPE: `DataFrame`
`intervals_set`	Name for the interval set. Must start with a letter and contain only alphanumeric characters, underscores, and dots. TYPE: `str`

RAISES	DESCRIPTION
`ValueError`	If the interval set name is invalid or already exists.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervals = pm.gintervals(["1", "2"], [100, 200], [1000, 2000])
>>> pm.gintervals_save(intervals, "my_intervals")

RETURNS	DESCRIPTION
`None`

See Also

gintervals_load : Load a named interval set. gintervals_update : Update a chromosome in an existing set. gintervals_exists : Check if a named interval set exists. gintervals_ls : List named interval sets. gintervals_rm : Remove a named interval set.

pymisha.gintervals_update ¶

gintervals_update(intervals_set, intervals, chrom=None)

Update intervals for a specific chromosome in an existing intervals set.

Replaces all intervals for the given chromosome with the new intervals. Pass intervals=None to delete all intervals for that chromosome.

PARAMETER	DESCRIPTION
`intervals_set`	Name of the existing intervals set. TYPE: `str`
`intervals`	New intervals for the chromosome, or None to delete. TYPE: `DataFrame or None`
`chrom`	Chromosome to update. Required. TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`None`

RAISES	DESCRIPTION
`ValueError`	If intervals set does not exist or chrom is not specified.

See Also

gintervals_save : Save a new interval set. gintervals_load : Load a named interval set. gintervals_exists : Check if a named interval set exists. gintervals_ls : List named interval sets.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs = pm.gintervals(["1", "2"], [0, 0], [10000, 10000])
>>> pm.gintervals_save(intervs, "testintervs")
>>> pm.gintervals_update("testintervs", pm.gintervals("2", 500, 5000), chrom="2")
>>> pm.gintervals_rm("testintervs", force=True)

pymisha.gintervals_rm ¶

gintervals_rm(intervals_set, force=False)

Remove a named interval set from the database.

PARAMETER	DESCRIPTION
`intervals_set`	Name of the interval set to remove. TYPE: `str`
`force`	If True, do not raise an error if the interval set does not exist. TYPE: `bool` DEFAULT: `False`

RAISES	DESCRIPTION
`ValueError`	If the interval set does not exist and force is False.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_rm("my_intervals")

RETURNS	DESCRIPTION
`None`

See Also

gintervals_save : Save intervals as a named set. gintervals_load : Load a named interval set. gintervals_exists : Check if a named interval set exists. gintervals_ls : List named interval sets.

pymisha.gintervals_rbind ¶

gintervals_rbind(*intervals, intervals_set_out=None)

Concatenate interval sets (DataFrames and/or named interval-set strings).

PARAMETER	DESCRIPTION
`*intervals`	One or more interval sets. Each argument can be a DataFrame or a named interval set (loaded via :func:`gintervals_load`). TYPE: `DataFrame or str` DEFAULT: `()`
`intervals_set_out`	If provided, save the concatenated intervals via :func:`gintervals_save` and return `None`. TYPE: `str` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame or None`	Concatenated intervals when intervals_set_out is `None`. Otherwise returns `None` after saving.

RAISES	DESCRIPTION
`ValueError`	If no interval arguments are provided, if an interval set does not exist, or if columns do not match exactly.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> i1 = pm.gextract("sparse_track", pm.gintervals(["1", "2"], 1000, 4000))
>>> i2 = pm.gextract("sparse_track", pm.gintervals(["2", "X"], 2000, 5000))
>>> pm.gintervals_save(i2, "tmp_intervs")
>>> pm.gintervals_rbind(i1, "tmp_intervs")
>>> pm.gintervals_rm("tmp_intervs", force=True)

See Also

gintervals_load : Load a named interval set. gintervals_save : Save intervals as a named set. gintervals_canonic : Merge overlapping intervals within one set.

pymisha.gintervals_mapply ¶

gintervals_mapply(func, *exprs, intervals=None, iterator=None, intervals_set_out=None, colnames='value')

Apply a function to track expression values for each interval.

Evaluates track expressions for each interval and passes the resulting value arrays to func. The return value of func becomes a new column in the output.

PARAMETER	DESCRIPTION
`func`	Function to apply. Receives one numpy array per track expression. TYPE: `callable`
`*exprs`	Track expressions to evaluate. TYPE: `str` DEFAULT: `()`
`intervals`	Intervals to process. TYPE: `DataFrame` DEFAULT: `None`
`iterator`	Track expression iterator. TYPE: `optional` DEFAULT: `None`
`intervals_set_out`	If given, save result as an intervals set and return None. TYPE: `str` DEFAULT: `None`
`colnames`	Name of the result column. TYPE: `str` DEFAULT: `"value"`

RETURNS	DESCRIPTION
`DataFrame or None`	Intervals with an additional column containing func results, or None if intervals_set_out is specified.

See Also

giterator_intervals : Inspect iterator bin boundaries.

Examples:

>>> import pymisha as pm
>>> import numpy as np
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_mapply(
...     np.max, "dense_track",
...     intervals=pm.gintervals(["1", "2"], 0, 10000),
... )

pymisha.gintervals_convert_to_indexed ¶

gintervals_convert_to_indexed(set_name, remove_old=False, force=False)

Convert a 1D big interval set to indexed format.

Converts per-chromosome interval files into a single intervals.dat + intervals.idx pair, reducing file-descriptor usage from N files to 2. The indexed format is backward-compatible with all misha interval functions.

PARAMETER	DESCRIPTION
`set_name`	Name of the 1D interval set to convert. TYPE: `str`
`remove_old`	If True, remove the old per-chromosome files after conversion. TYPE: `bool` DEFAULT: `False`
`force`	If True, re-convert even if the set is already indexed. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`None`

RAISES	DESCRIPTION
`ValueError`	If set_name is empty or the interval set does not exist.

See Also

gintervals_2d_convert_to_indexed : Convert a 2D interval set to indexed format. gintervals_is_indexed : Check if a set is already indexed. gintervals_save : Save intervals as a named set. gintervals_load : Load a named interval set.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_convert_to_indexed("my_intervals")
>>> pm.gintervals_convert_to_indexed("my_intervals", remove_old=True)

pymisha.gintervals_2d_convert_to_indexed ¶

gintervals_2d_convert_to_indexed(set_name, remove_old=False, force=False)

Convert a 2D big interval set to indexed format.

Converts per-chromosome-pair interval files into a single intervals2d.dat + intervals2d.idx pair. This dramatically reduces file-descriptor usage, especially for genomes with many chromosomes (from N*(N-1)/2 files to 2).

PARAMETER	DESCRIPTION
`set_name`	Name of the 2D interval set to convert. TYPE: `str`
`remove_old`	If True, remove the old per-pair files after conversion. TYPE: `bool` DEFAULT: `False`
`force`	If True, re-convert even if the set is already indexed. TYPE: `bool` DEFAULT: `False`

RETURNS	DESCRIPTION
`None`

RAISES	DESCRIPTION
`ValueError`	If set_name is empty or the interval set does not exist.

See Also

gintervals_convert_to_indexed : Convert a 1D interval set to indexed format. gintervals_is_indexed : Check if a set is already indexed. gintervals_save : Save intervals as a named set. gintervals_load : Load a named interval set.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_2d_convert_to_indexed("my_2d_intervals")
>>> pm.gintervals_2d_convert_to_indexed("my_2d_intervals", remove_old=True)

pymisha.gintervals_is_indexed ¶

gintervals_is_indexed(intervals_set)

Check whether a big interval set is stored in indexed format.

Indexed format means the set uses intervals.idx/intervals.dat (1D) or intervals2d.idx/intervals2d.dat (2D) files instead of per-chromosome files.

PARAMETER	DESCRIPTION
`intervals_set`	Name of the interval set to check. TYPE: `str`

RETURNS	DESCRIPTION
`bool`	`True` if the set is a big (directory-based) interval set stored in indexed format, `False` otherwise (including non-directory sets).

See Also

gintervals_convert_to_indexed : Convert a 1D set to indexed format. gintervals_2d_convert_to_indexed : Convert a 2D set to indexed format. gintervals_exists : Check if a named interval set exists.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gintervals_is_indexed("annotations")
False

pymisha.giterator_cartesian_grid ¶

giterator_cartesian_grid(intervals1, expansion1, intervals2=None, expansion2=None, min_band_idx=None, max_band_idx=None)

Create a 2D cartesian-grid iterator as 2D intervals.

The grid is built from 1D interval centers and expansion breakpoints. For each center C and consecutive expansion pair (E[i], E[i+1]), one 1D window [C + E[i], C + E[i+1]) is created (clipped to chromosome bounds). The final result is the cartesian product of windows from intervals1 and intervals2.

PARAMETER	DESCRIPTION
`intervals1`	1D intervals with columns `chrom`, `start`, `end`. TYPE: `DataFrame`
`expansion1`	Expansion breakpoints around centers of `intervals1`. Must contain at least two unique values. TYPE: `sequence of int`
`intervals2`	Second 1D interval source. If `None`, `intervals1` is reused. TYPE: `DataFrame` DEFAULT: `None`
`expansion2`	Expansion breakpoints for `intervals2`. If `None`, `expansion1` is reused. TYPE: `sequence of int` DEFAULT: `None`
`min_band_idx`	Lower bound for center-index delta filtering (`idx1 - idx2`). Can be used only when `intervals2` is `None`. TYPE: `int` DEFAULT: `None`
`max_band_idx`	Upper bound for center-index delta filtering. Can be used only when `intervals2` is `None`. TYPE: `int` DEFAULT: `None`

RETURNS	DESCRIPTION
`DataFrame`	2D intervals with columns: `chrom1`, `start1`, `end1`, `chrom2`, `start2`, `end2`.

RAISES	DESCRIPTION
`ValueError`	If inputs are invalid.

pymisha.giterator_intervals ¶

giterator_intervals(expr=None, intervals=None, iterator=None, interval_relative=False, partial_bins='clip')

Return the iterator intervals grid without evaluating track expressions.

This is useful for inspecting the bin boundaries that would be produced by a given iterator/interval combination before running a full extraction.

PARAMETER	DESCRIPTION
`expr`	Track expression (used to determine the implicit iterator when iterator is `None`). Pass `None` when an explicit numeric iterator is supplied. TYPE: `str` DEFAULT: `None`
`intervals`	Genomic scope. Defaults to :func:`gintervals_all` (whole genome). TYPE: `DataFrame` DEFAULT: `None`
`iterator`	Numeric bin size or track name that defines the iterator. TYPE: `int or str` DEFAULT: `None`
`interval_relative`	When `True`, bins are aligned to each input interval's start rather than to chromosome position 0. Requires a numeric iterator. TYPE: `bool` DEFAULT: `False`
`partial_bins`	How to handle bins that do not fit entirely within an interval. `"clip"` — truncate the last bin at the interval boundary (default, current behavior). `"drop"` — discard bins whose size is smaller than the full bin size. `"exact"` — same as `"drop"`. TYPE: `str` DEFAULT: ``"clip"``

RETURNS	DESCRIPTION
`DataFrame`	DataFrame with columns `chrom`, `start`, `end`, `intervalID`.

RAISES	DESCRIPTION
`ValueError`	If neither expr nor iterator is provided.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.giterator_intervals(intervals=pm.gintervals("1", 0, 200), iterator=50)
>>> pm.giterator_intervals("dense_track", pm.gintervals("1", 0, 1000))

See Also

gintervals_mapply : Apply a function to track values per interval.