Virtual Tracks¶

Functions for creating and managing virtual tracks, which define computed views over existing tracks with custom iterators and filters.

pymisha.gvtrack_create ¶

gvtrack_create(vtrack_name, src, func='avg', params=None, sshift=0, eshift=0, **kwargs)

Create a virtual track.

A virtual track evaluates an aggregation function over a source track, intervals set, or genomic sequence within each iterator interval. Virtual tracks can be referenced by name anywhere a track expression is accepted (e.g., in gextract, gsummary, gdist). The virtual track persists in memory for the duration of the current session.

PARAMETER	DESCRIPTION
`vtrack_name`	Name for the virtual track. If a virtual track with this name already exists, it is silently overwritten. TYPE: `str`
`src`	Source for the virtual track. Can be: A track name (str) -- any track in the database (dense, sparse, array, or 2D). An intervals set name (str) -- used with interval-based functions like `'distance'`, `'coverage'`. A DataFrame with columns `chrom`, `start`, `end` and one numeric value column -- acts as an in-memory sparse (value-based) track. Intervals must not overlap. `None` -- for sequence-based functions (`'pwm'`, `'pwm.max'`, `'pwm.count'`, `'kmer.count'`, `'kmer.frac'`, `'masked.count'`, `'masked.frac'`). TYPE: `str, pandas.DataFrame, or None`
`func`	Aggregation function to apply. Supported functions include: Track-based: `'avg'`, `'sum'`, `'min'`, `'max'`, `'stddev'`, `'nearest'`, `'quantile'`, `'coverage'`, `'exists'`, `'size'`, `'first'`, `'last'`, `'sample'`, `'lse'`, `'global.percentile'` Distance-based (intervals source): `'distance'`, `'distance.center'`, `'distance.edge'`, `'neighbor.count'` Position-based: `'first.pos.abs'`, `'first.pos.relative'`, `'last.pos.abs'`, `'last.pos.relative'`, `'min.pos.abs'`, `'min.pos.relative'`, `'max.pos.abs'`, `'max.pos.relative'`, `'sample.pos.abs'`, `'sample.pos.relative'` 2D track: `'area'`, `'weighted.sum'`, `'exists'`, `'size'`, `'first'`, `'last'`, `'sample'`, `'global.percentile'` Motif/PWM (src=None): `'pwm'`, `'pwm.max'`, `'pwm.max.pos'`, `'pwm.count'` Edit distance (src=None): `'pwm.edit_distance'`, `'pwm.edit_distance.pos'`, `'pwm.max.edit_distance'`, `'pwm.edit_distance.lse'`, `'pwm.edit_distance.lse.pos'` K-mer (src=None): `'kmer.count'`, `'kmer.frac'` Masked sequence (src=None): `'masked.count'`, `'masked.frac'` TYPE: `str` DEFAULT: `'avg'`
`params`	Function-specific parameter. For example, a percentile in [0, 1] for `'quantile'`, a max-distance integer for `'neighbor.count'`, or a score threshold for `'pwm.count'`. TYPE: `float, str, or None` DEFAULT: `None`
`sshift`	Shift added to the start coordinate of each iterator interval before the virtual track function is evaluated. TYPE: `int` DEFAULT: `0`
`eshift`	Shift added to the end coordinate of each iterator interval before the virtual track function is evaluated. TYPE: `int` DEFAULT: `0`
`**kwargs`	Additional keyword arguments, depending on `func`: `pssm` (numpy.ndarray or pandas.DataFrame) -- Position-specific scoring matrix with 4 columns (A, C, G, T) for PWM functions. `prior` (float) -- Pseudocount added to PSSM frequencies (default 0.01 for PWM functions). `bidirect` (bool) -- If True, score both DNA strands (PWM). `extend` (bool) -- If True (default), extend the scanned sequence so boundary-anchored motifs retain full context. `score_thresh` (float) -- Score threshold for `'pwm.count'` and edit distance functions. `max_edits` (int or None) -- Maximum number of edits for edit distance functions. None (default) uses exact computation. `max_indels` (int) -- Maximum insertions+deletions for `'pwm.edit_distance'`, `'pwm.edit_distance.pos'`, `'pwm.max.edit_distance'`. Default 0 (substitutions only). `direction` (str) -- Score direction for edit distance functions: `'above'` (default) finds minimum edits to raise score above threshold; `'below'` finds minimum edits to lower score below threshold. `score_min` (float or None) -- Minimum PWM score filter for edit distance functions. Windows below this are skipped. `score_max` (float or None) -- Maximum PWM score filter for edit distance functions. Windows above this are skipped. `strand` (int) -- Strand selection: 1 (forward), -1 (reverse), 0 (both). Used by kmer and single-strand PWM modes. `kmer` (str) -- DNA k-mer sequence for kmer functions. `spat_factor` (list of float) -- Spatial weighting factors for PWM functions. `spat_bin` (int) -- Bin width for spatial weighting. `spat_min` (int) -- Minimum scan position (1-based). `spat_max` (int) -- Maximum scan position (1-based). `filter` (pandas.DataFrame, str, list, or None) -- Genomic mask filter. See `gvtrack_filter` for details. DEFAULT: `{}`

RETURNS	DESCRIPTION
`None`

RAISES	DESCRIPTION
`ValueError`	If the filter source is invalid or refers to a non-intervals-type track.

See Also

gvtrack_info : Retrieve the configuration of a virtual track. gvtrack_iterator : Override iterator shifts for a virtual track. gvtrack_iterator_2d : Set 2D iterator shifts for a virtual track. gvtrack_filter : Attach or clear a genomic mask filter. gvtrack_rm : Remove a single virtual track. gvtrack_ls : List all virtual tracks.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

Create a virtual track with a max aggregation:

>>> pm.gvtrack_create("vt_max", "dense_track", func="max")
>>> pm.gextract("vt_max", pm.gintervals(["1"], [0], [10000]), iterator=1000)

Create a quantile virtual track with a median (0.5) parameter:

>>> pm.gvtrack_create("vt_q50", "dense_track", func="quantile", params=0.5)

Create a distance virtual track from an intervals source:

>>> pm.gvtrack_create("vt_dist", "annotations", func="distance")

Create a PWM virtual track scanning both strands:

>>> import numpy as np
>>> pssm = np.array([[0.7, 0.1, 0.1, 0.1],
...                  [0.1, 0.7, 0.1, 0.1],
...                  [0.1, 0.1, 0.7, 0.1],
...                  [0.1, 0.1, 0.1, 0.7]])
>>> pm.gvtrack_create("motif", None, func="pwm",
...                   pssm=pssm, bidirect=True, prior=0.01)

Create a k-mer counting virtual track:

>>> pm.gvtrack_create("cg_count", None, func="kmer.count",
...                   kmer="CG", strand=1)

pymisha.gvtrack_ls ¶

gvtrack_ls()

List all currently defined virtual tracks.

Returns the names of all virtual tracks that have been created in the current session via gvtrack_create. Unlike the R counterpart, this function does not support pattern filtering; use standard Python list comprehensions to filter the result if needed.

RETURNS	DESCRIPTION
`list of str`	Names of all virtual tracks in the current session. Returns an empty list if no virtual tracks have been created.

See Also

gvtrack_create : Create a new virtual track. gvtrack_info : Retrieve configuration of a virtual track. gvtrack_rm : Remove a single virtual track. gvtrack_clear : Remove all virtual tracks.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_clear()
>>> pm.gvtrack_ls()
[]

>>> pm.gvtrack_create("vt1", "dense_track", func="avg")
>>> pm.gvtrack_create("vt2", "dense_track", func="max")
>>> pm.gvtrack_ls()
['vt1', 'vt2']

Filter with a list comprehension:

>>> [v for v in pm.gvtrack_ls() if "2" in v]
['vt2']

pymisha.gvtrack_info ¶

gvtrack_info(vtrack_name)

Return the definition of a virtual track.

Retrieves the full internal configuration dictionary for a previously created virtual track. This is useful for inspecting or programmatically modifying virtual track settings.

PARAMETER	DESCRIPTION
`vtrack_name`	Name of an existing virtual track. TYPE: `str`

RETURNS	DESCRIPTION
`dict`	A copy of the virtual track configuration dictionary. Keys always include `'src'`, `'func'`, `'params'`, `'sshift'`, `'eshift'`, `'filter'`, `'filter_key'`, and `'filter_stats'`. Additional keys (e.g., `'pssm'`, `'bidirect'`, `'kmer'`, `'dim'`) are present when supplied at creation time or via `gvtrack_iterator` / `gvtrack_iterator_2d`.

RAISES	DESCRIPTION
`KeyError`	If no virtual track with the given name exists.

See Also

gvtrack_create : Create a new virtual track. gvtrack_ls : List all virtual tracks. gvtrack_filter : Attach or clear a genomic mask filter.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_create("vt1", "dense_track", func="max")
>>> info = pm.gvtrack_info("vt1")
>>> info["func"]
'max'
>>> info["src"]
'dense_track'
>>> info["sshift"]
0

pymisha.gvtrack_iterator ¶

gvtrack_iterator(vtrack_name, dim=None, sshift=0, eshift=0)

Define modification rules for the 1D iterator of a virtual track.

By default a virtual track is evaluated over the same iterator intervals as the calling function (e.g., gextract, gsummary). This function allows independent control of the genomic window the virtual track sees by applying custom start/end shifts. It can also project a 2D iterator down to one of its 1D dimensions.

PARAMETER	DESCRIPTION
`vtrack_name`	Name of an existing virtual track. TYPE: `str`
`dim`	Dimension projection for 2D iterators: `None` or `0` -- no conversion; shifts apply to the 1D iterator directly. `1` -- convert a 2D iterator interval `(chrom1, start1, end1, chrom2, start2, end2)` to `(chrom1, start1, end1)` before applying shifts. `2` -- convert to `(chrom2, start2, end2)` before applying shifts. TYPE: `int or None` DEFAULT: `None`
`sshift`	Value added to the start coordinate of each iterator interval. Negative values expand the window upstream. TYPE: `int` DEFAULT: `0`
`eshift`	Value added to the end coordinate of each iterator interval. Positive values expand the window downstream. TYPE: `int` DEFAULT: `0`

RETURNS	DESCRIPTION
`None`

RAISES	DESCRIPTION
`KeyError`	If no virtual track with the given name exists.

See Also

gvtrack_create : Create a new virtual track. gvtrack_iterator_2d : Set 2D iterator shifts for a virtual track. gvtrack_filter : Attach a genomic mask filter.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

Shift the evaluation window 200 bp downstream:

>>> pm.gvtrack_create("vt1", "dense_track", func="avg")
>>> pm.gvtrack_iterator("vt1", sshift=200, eshift=200)
>>> pm.gextract("dense_track", "vt1",
...             pm.gintervals(["1"], [0], [500]))

Expand the window symmetrically by 500 bp in each direction:

>>> pm.gvtrack_create("vt2", "dense_track", func="sum")
>>> pm.gvtrack_iterator("vt2", sshift=-500, eshift=500)

Project dimension 1 of a 2D iterator for a 1D virtual track:

>>> pm.gvtrack_create("vt3", "dense_track", func="avg")
>>> pm.gvtrack_iterator("vt3", dim=1)

pymisha.gvtrack_iterator_2d ¶

gvtrack_iterator_2d(vtrack_name, sshift1=0, eshift1=0, sshift2=0, eshift2=0)

Define modification rules for the 2D iterator of a virtual track.

Sets independent start/end shifts for both dimensions of a 2D iterator interval. The shifts are added to the coordinates of each 2D iterator interval before the virtual track function is evaluated.

PARAMETER	DESCRIPTION
`vtrack_name`	Name of an existing virtual track. TYPE: `str`
`sshift1`	Value added to the `start1` coordinate of each 2D iterator interval. TYPE: `int` DEFAULT: `0`
`eshift1`	Value added to the `end1` coordinate of each 2D iterator interval. TYPE: `int` DEFAULT: `0`
`sshift2`	Value added to the `start2` coordinate of each 2D iterator interval. TYPE: `int` DEFAULT: `0`
`eshift2`	Value added to the `end2` coordinate of each 2D iterator interval. TYPE: `int` DEFAULT: `0`

RETURNS	DESCRIPTION
`None`

RAISES	DESCRIPTION
`KeyError`	If no virtual track with the given name exists.

See Also

gvtrack_create : Create a new virtual track. gvtrack_iterator : Set 1D iterator shifts or project a 2D dimension.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_create("vt_2d", "rects_track", func="area")
>>> pm.gvtrack_iterator_2d("vt_2d", sshift1=1000, eshift1=2000)
>>> pm.gextract("rects_track", "vt_2d",
...             pm.gintervals_2d(["1"], [0], [5000], ["2"], [0], [5000]))

pymisha.gvtrack_filter ¶

gvtrack_filter(vtrack_name, mask=None, **kwargs)

Attach or clear a genomic mask filter on a virtual track.

When a filter is attached, the virtual track function is evaluated only over the unmasked regions -- that is, regions NOT covered by the filter intervals. Masked positions are excluded from aggregation, and an iterator interval that is entirely masked returns NaN. The filter persists on the virtual track until explicitly cleared.

Filters are applied after iterator modifiers (sshift/eshift/ dim). The order of operations is: (1) apply iterator shifts, (2) subtract mask from the shifted intervals, (3) evaluate the virtual track function over the remaining unmasked segments.

PARAMETER	DESCRIPTION
`vtrack_name`	Name of an existing virtual track. TYPE: `str`
`mask`	The genomic mask to apply. Accepted forms: A `pandas.DataFrame` with columns `chrom`, `start`, `end` -- intervals to mask. A `str` naming an intervals set in the database. A `str` naming an intervals-type (sparse) track. A `list` or `tuple` of any combination of the above; all sources are unified into a single mask. `None` -- clears any existing filter from the virtual track. TYPE: `pandas.DataFrame, str, list, or None` DEFAULT: `None`
`filter`	Backward-compatible alias for `mask`. TYPE: `pandas.DataFrame, str, list, or None`

RETURNS	DESCRIPTION
`None`

RAISES	DESCRIPTION
`KeyError`	If no virtual track with the given name exists.
`ValueError`	If a string filter source is not a recognized intervals set or intervals-type track, or if a DataFrame is missing required columns.

See Also

gvtrack_create : Create a virtual track (filter can also be set at creation time via the filter keyword argument). gvtrack_info : Inspect a virtual track's configuration including its filter. gvtrack_iterator : Set iterator shifts (applied before the filter).

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

Attach a filter to exclude specific regions:

>>> pm.gvtrack_create("vt1", "dense_track", func="avg")
>>> mask = pm.gintervals(["1", "1"], [100, 500], [200, 600])
>>> pm.gvtrack_filter("vt1", filter=mask)
>>> pm.gvtrack_info("vt1")["filter"] is not None
True

Clear the filter:

>>> pm.gvtrack_filter("vt1", filter=None)
>>> pm.gvtrack_info("vt1")["filter"] is None
True

Use multiple filter sources (automatically unified):

>>> mask1 = pm.gintervals(["1"], [100], [200])
>>> mask2 = pm.gintervals(["1"], [500], [600])
>>> pm.gvtrack_filter("vt1", filter=[mask1, mask2])

pymisha.gvtrack_rm ¶

gvtrack_rm(vtrack_name)

Remove a virtual track.

Deletes a single virtual track from the current session. If the named virtual track does not exist, the call is silently ignored (no error is raised).

PARAMETER	DESCRIPTION
`vtrack_name`	Name of the virtual track to remove. TYPE: `str`

RETURNS	DESCRIPTION
`None`

See Also

gvtrack_create : Create a new virtual track. gvtrack_clear : Remove all virtual tracks at once. gvtrack_ls : List all virtual tracks.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_clear()
>>> pm.gvtrack_create("vt1", "dense_track", func="max")
>>> pm.gvtrack_create("vt2", "dense_track", func="min")
>>> pm.gvtrack_ls()
['vt1', 'vt2']
>>> pm.gvtrack_rm("vt1")
>>> pm.gvtrack_ls()
['vt2']

Removing a non-existent track is a no-op:

>>> pm.gvtrack_rm("does_not_exist")

pymisha.gvtrack_clear ¶

gvtrack_clear()

Remove all virtual tracks.

Clears the entire virtual track registry for the current session. After this call, gvtrack_ls() returns an empty list. This is useful for resetting state between analyses or in test fixtures.

RETURNS	DESCRIPTION
`None`

See Also

gvtrack_rm : Remove a single virtual track by name. gvtrack_ls : List all virtual tracks. gvtrack_create : Create a new virtual track.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_clear()
>>> pm.gvtrack_create("vt1", "dense_track", func="avg")
>>> pm.gvtrack_create("vt2", "dense_track", func="max")
>>> len(pm.gvtrack_ls())
2
>>> pm.gvtrack_clear()
>>> pm.gvtrack_ls()
[]