Skip to content

Virtual Tracks

Functions for creating and managing virtual tracks, which define computed views over existing tracks with custom iterators and filters.

pymisha.gvtrack_create

gvtrack_create(vtrack_name, src, func='avg', params=None, sshift=0, eshift=0, **kwargs)

Create a virtual track.

A virtual track evaluates an aggregation function over a source track, intervals set, or genomic sequence within each iterator interval. Virtual tracks can be referenced by name anywhere a track expression is accepted (e.g., in gextract, gsummary, gdist). The virtual track persists in memory for the duration of the current session.

PARAMETER DESCRIPTION
vtrack_name

Name for the virtual track. If a virtual track with this name already exists, it is silently overwritten.

TYPE: str

src

Source for the virtual track. Can be:

  • A track name (str) -- any track in the database (dense, sparse, array, or 2D).
  • An intervals set name (str) -- used with interval-based functions like 'distance', 'coverage'.
  • A DataFrame with columns chrom, start, end and one numeric value column -- acts as an in-memory sparse (value-based) track. Intervals must not overlap.
  • None -- for sequence-based functions ('pwm', 'pwm.max', 'pwm.count', 'kmer.count', 'kmer.frac', 'masked.count', 'masked.frac').

TYPE: str, pandas.DataFrame, or None

func

Aggregation function to apply. Supported functions include:

  • Track-based: 'avg', 'sum', 'min', 'max', 'stddev', 'nearest', 'quantile', 'coverage', 'exists', 'size', 'first', 'last', 'sample', 'lse', 'global.percentile'
  • Distance-based (intervals source): 'distance', 'distance.center', 'distance.edge', 'neighbor.count'
  • Position-based: 'first.pos.abs', 'first.pos.relative', 'last.pos.abs', 'last.pos.relative', 'min.pos.abs', 'min.pos.relative', 'max.pos.abs', 'max.pos.relative', 'sample.pos.abs', 'sample.pos.relative'
  • 2D track: 'area', 'weighted.sum', 'exists', 'size', 'first', 'last', 'sample', 'global.percentile'
  • Motif/PWM (src=None): 'pwm', 'pwm.max', 'pwm.max.pos', 'pwm.count'
  • Edit distance (src=None): 'pwm.edit_distance', 'pwm.edit_distance.pos', 'pwm.max.edit_distance', 'pwm.edit_distance.lse', 'pwm.edit_distance.lse.pos'
  • K-mer (src=None): 'kmer.count', 'kmer.frac'
  • Masked sequence (src=None): 'masked.count', 'masked.frac'

TYPE: str DEFAULT: 'avg'

params

Function-specific parameter. For example, a percentile in [0, 1] for 'quantile', a max-distance integer for 'neighbor.count', or a score threshold for 'pwm.count'.

TYPE: float, str, or None DEFAULT: None

sshift

Shift added to the start coordinate of each iterator interval before the virtual track function is evaluated.

TYPE: int DEFAULT: 0

eshift

Shift added to the end coordinate of each iterator interval before the virtual track function is evaluated.

TYPE: int DEFAULT: 0

**kwargs

Additional keyword arguments, depending on func:

  • pssm (numpy.ndarray or pandas.DataFrame) -- Position-specific scoring matrix with 4 columns (A, C, G, T) for PWM functions.
  • prior (float) -- Pseudocount added to PSSM frequencies (default 0.01 for PWM functions).
  • bidirect (bool) -- If True, score both DNA strands (PWM).
  • extend (bool) -- If True (default), extend the scanned sequence so boundary-anchored motifs retain full context.
  • score_thresh (float) -- Score threshold for 'pwm.count' and edit distance functions.
  • max_edits (int or None) -- Maximum number of edits for edit distance functions. None (default) uses exact computation.
  • max_indels (int) -- Maximum insertions+deletions for 'pwm.edit_distance', 'pwm.edit_distance.pos', 'pwm.max.edit_distance'. Default 0 (substitutions only).
  • direction (str) -- Score direction for edit distance functions: 'above' (default) finds minimum edits to raise score above threshold; 'below' finds minimum edits to lower score below threshold.
  • score_min (float or None) -- Minimum PWM score filter for edit distance functions. Windows below this are skipped.
  • score_max (float or None) -- Maximum PWM score filter for edit distance functions. Windows above this are skipped.
  • strand (int) -- Strand selection: 1 (forward), -1 (reverse), 0 (both). Used by kmer and single-strand PWM modes.
  • kmer (str) -- DNA k-mer sequence for kmer functions.
  • spat_factor (list of float) -- Spatial weighting factors for PWM functions.
  • spat_bin (int) -- Bin width for spatial weighting.
  • spat_min (int) -- Minimum scan position (1-based).
  • spat_max (int) -- Maximum scan position (1-based).
  • filter (pandas.DataFrame, str, list, or None) -- Genomic mask filter. See gvtrack_filter for details.

DEFAULT: {}

RETURNS DESCRIPTION
None
RAISES DESCRIPTION
ValueError

If the filter source is invalid or refers to a non-intervals-type track.

See Also

gvtrack_info : Retrieve the configuration of a virtual track. gvtrack_iterator : Override iterator shifts for a virtual track. gvtrack_iterator_2d : Set 2D iterator shifts for a virtual track. gvtrack_filter : Attach or clear a genomic mask filter. gvtrack_rm : Remove a single virtual track. gvtrack_ls : List all virtual tracks.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

Create a virtual track with a max aggregation:

>>> pm.gvtrack_create("vt_max", "dense_track", func="max")
>>> pm.gextract("vt_max", pm.gintervals(["1"], [0], [10000]), iterator=1000)

Create a quantile virtual track with a median (0.5) parameter:

>>> pm.gvtrack_create("vt_q50", "dense_track", func="quantile", params=0.5)

Create a distance virtual track from an intervals source:

>>> pm.gvtrack_create("vt_dist", "annotations", func="distance")

Create a PWM virtual track scanning both strands:

>>> import numpy as np
>>> pssm = np.array([[0.7, 0.1, 0.1, 0.1],
...                  [0.1, 0.7, 0.1, 0.1],
...                  [0.1, 0.1, 0.7, 0.1],
...                  [0.1, 0.1, 0.1, 0.7]])
>>> pm.gvtrack_create("motif", None, func="pwm",
...                   pssm=pssm, bidirect=True, prior=0.01)

Create a k-mer counting virtual track:

>>> pm.gvtrack_create("cg_count", None, func="kmer.count",
...                   kmer="CG", strand=1)

pymisha.gvtrack_ls

gvtrack_ls()

List all currently defined virtual tracks.

Returns the names of all virtual tracks that have been created in the current session via gvtrack_create. Unlike the R counterpart, this function does not support pattern filtering; use standard Python list comprehensions to filter the result if needed.

RETURNS DESCRIPTION
list of str

Names of all virtual tracks in the current session. Returns an empty list if no virtual tracks have been created.

See Also

gvtrack_create : Create a new virtual track. gvtrack_info : Retrieve configuration of a virtual track. gvtrack_rm : Remove a single virtual track. gvtrack_clear : Remove all virtual tracks.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_clear()
>>> pm.gvtrack_ls()
[]
>>> pm.gvtrack_create("vt1", "dense_track", func="avg")
>>> pm.gvtrack_create("vt2", "dense_track", func="max")
>>> pm.gvtrack_ls()
['vt1', 'vt2']

Filter with a list comprehension:

>>> [v for v in pm.gvtrack_ls() if "2" in v]
['vt2']

pymisha.gvtrack_info

gvtrack_info(vtrack_name)

Return the definition of a virtual track.

Retrieves the full internal configuration dictionary for a previously created virtual track. This is useful for inspecting or programmatically modifying virtual track settings.

PARAMETER DESCRIPTION
vtrack_name

Name of an existing virtual track.

TYPE: str

RETURNS DESCRIPTION
dict

A copy of the virtual track configuration dictionary. Keys always include 'src', 'func', 'params', 'sshift', 'eshift', 'filter', 'filter_key', and 'filter_stats'. Additional keys (e.g., 'pssm', 'bidirect', 'kmer', 'dim') are present when supplied at creation time or via gvtrack_iterator / gvtrack_iterator_2d.

RAISES DESCRIPTION
KeyError

If no virtual track with the given name exists.

See Also

gvtrack_create : Create a new virtual track. gvtrack_ls : List all virtual tracks. gvtrack_filter : Attach or clear a genomic mask filter.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_create("vt1", "dense_track", func="max")
>>> info = pm.gvtrack_info("vt1")
>>> info["func"]
'max'
>>> info["src"]
'dense_track'
>>> info["sshift"]
0

pymisha.gvtrack_iterator

gvtrack_iterator(vtrack_name, dim=None, sshift=0, eshift=0)

Define modification rules for the 1D iterator of a virtual track.

By default a virtual track is evaluated over the same iterator intervals as the calling function (e.g., gextract, gsummary). This function allows independent control of the genomic window the virtual track sees by applying custom start/end shifts. It can also project a 2D iterator down to one of its 1D dimensions.

PARAMETER DESCRIPTION
vtrack_name

Name of an existing virtual track.

TYPE: str

dim

Dimension projection for 2D iterators:

  • None or 0 -- no conversion; shifts apply to the 1D iterator directly.
  • 1 -- convert a 2D iterator interval (chrom1, start1, end1, chrom2, start2, end2) to (chrom1, start1, end1) before applying shifts.
  • 2 -- convert to (chrom2, start2, end2) before applying shifts.

TYPE: int or None DEFAULT: None

sshift

Value added to the start coordinate of each iterator interval. Negative values expand the window upstream.

TYPE: int DEFAULT: 0

eshift

Value added to the end coordinate of each iterator interval. Positive values expand the window downstream.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
None
RAISES DESCRIPTION
KeyError

If no virtual track with the given name exists.

See Also

gvtrack_create : Create a new virtual track. gvtrack_iterator_2d : Set 2D iterator shifts for a virtual track. gvtrack_filter : Attach a genomic mask filter.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

Shift the evaluation window 200 bp downstream:

>>> pm.gvtrack_create("vt1", "dense_track", func="avg")
>>> pm.gvtrack_iterator("vt1", sshift=200, eshift=200)
>>> pm.gextract("dense_track", "vt1",
...             pm.gintervals(["1"], [0], [500]))

Expand the window symmetrically by 500 bp in each direction:

>>> pm.gvtrack_create("vt2", "dense_track", func="sum")
>>> pm.gvtrack_iterator("vt2", sshift=-500, eshift=500)

Project dimension 1 of a 2D iterator for a 1D virtual track:

>>> pm.gvtrack_create("vt3", "dense_track", func="avg")
>>> pm.gvtrack_iterator("vt3", dim=1)

pymisha.gvtrack_iterator_2d

gvtrack_iterator_2d(vtrack_name, sshift1=0, eshift1=0, sshift2=0, eshift2=0)

Define modification rules for the 2D iterator of a virtual track.

Sets independent start/end shifts for both dimensions of a 2D iterator interval. The shifts are added to the coordinates of each 2D iterator interval before the virtual track function is evaluated.

PARAMETER DESCRIPTION
vtrack_name

Name of an existing virtual track.

TYPE: str

sshift1

Value added to the start1 coordinate of each 2D iterator interval.

TYPE: int DEFAULT: 0

eshift1

Value added to the end1 coordinate of each 2D iterator interval.

TYPE: int DEFAULT: 0

sshift2

Value added to the start2 coordinate of each 2D iterator interval.

TYPE: int DEFAULT: 0

eshift2

Value added to the end2 coordinate of each 2D iterator interval.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
None
RAISES DESCRIPTION
KeyError

If no virtual track with the given name exists.

See Also

gvtrack_create : Create a new virtual track. gvtrack_iterator : Set 1D iterator shifts or project a 2D dimension.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_create("vt_2d", "rects_track", func="area")
>>> pm.gvtrack_iterator_2d("vt_2d", sshift1=1000, eshift1=2000)
>>> pm.gextract("rects_track", "vt_2d",
...             pm.gintervals_2d(["1"], [0], [5000], ["2"], [0], [5000]))

pymisha.gvtrack_filter

gvtrack_filter(vtrack_name, mask=None, **kwargs)

Attach or clear a genomic mask filter on a virtual track.

When a filter is attached, the virtual track function is evaluated only over the unmasked regions -- that is, regions NOT covered by the filter intervals. Masked positions are excluded from aggregation, and an iterator interval that is entirely masked returns NaN. The filter persists on the virtual track until explicitly cleared.

Filters are applied after iterator modifiers (sshift/eshift/ dim). The order of operations is: (1) apply iterator shifts, (2) subtract mask from the shifted intervals, (3) evaluate the virtual track function over the remaining unmasked segments.

PARAMETER DESCRIPTION
vtrack_name

Name of an existing virtual track.

TYPE: str

mask

The genomic mask to apply. Accepted forms:

  • A pandas.DataFrame with columns chrom, start, end -- intervals to mask.
  • A str naming an intervals set in the database.
  • A str naming an intervals-type (sparse) track.
  • A list or tuple of any combination of the above; all sources are unified into a single mask.
  • None -- clears any existing filter from the virtual track.

TYPE: pandas.DataFrame, str, list, or None DEFAULT: None

filter

Backward-compatible alias for mask.

TYPE: pandas.DataFrame, str, list, or None

RETURNS DESCRIPTION
None
RAISES DESCRIPTION
KeyError

If no virtual track with the given name exists.

ValueError

If a string filter source is not a recognized intervals set or intervals-type track, or if a DataFrame is missing required columns.

See Also

gvtrack_create : Create a virtual track (filter can also be set at creation time via the filter keyword argument). gvtrack_info : Inspect a virtual track's configuration including its filter. gvtrack_iterator : Set iterator shifts (applied before the filter).

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

Attach a filter to exclude specific regions:

>>> pm.gvtrack_create("vt1", "dense_track", func="avg")
>>> mask = pm.gintervals(["1", "1"], [100, 500], [200, 600])
>>> pm.gvtrack_filter("vt1", filter=mask)
>>> pm.gvtrack_info("vt1")["filter"] is not None
True

Clear the filter:

>>> pm.gvtrack_filter("vt1", filter=None)
>>> pm.gvtrack_info("vt1")["filter"] is None
True

Use multiple filter sources (automatically unified):

>>> mask1 = pm.gintervals(["1"], [100], [200])
>>> mask2 = pm.gintervals(["1"], [500], [600])
>>> pm.gvtrack_filter("vt1", filter=[mask1, mask2])

pymisha.gvtrack_rm

gvtrack_rm(vtrack_name)

Remove a virtual track.

Deletes a single virtual track from the current session. If the named virtual track does not exist, the call is silently ignored (no error is raised).

PARAMETER DESCRIPTION
vtrack_name

Name of the virtual track to remove.

TYPE: str

RETURNS DESCRIPTION
None
See Also

gvtrack_create : Create a new virtual track. gvtrack_clear : Remove all virtual tracks at once. gvtrack_ls : List all virtual tracks.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_clear()
>>> pm.gvtrack_create("vt1", "dense_track", func="max")
>>> pm.gvtrack_create("vt2", "dense_track", func="min")
>>> pm.gvtrack_ls()
['vt1', 'vt2']
>>> pm.gvtrack_rm("vt1")
>>> pm.gvtrack_ls()
['vt2']

Removing a non-existent track is a no-op:

>>> pm.gvtrack_rm("does_not_exist")

pymisha.gvtrack_clear

gvtrack_clear()

Remove all virtual tracks.

Clears the entire virtual track registry for the current session. After this call, gvtrack_ls() returns an empty list. This is useful for resetting state between analyses or in test fixtures.

RETURNS DESCRIPTION
None
See Also

gvtrack_rm : Remove a single virtual track by name. gvtrack_ls : List all virtual tracks. gvtrack_create : Create a new virtual track.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gvtrack_clear()
>>> pm.gvtrack_create("vt1", "dense_track", func="avg")
>>> pm.gvtrack_create("vt2", "dense_track", func="max")
>>> len(pm.gvtrack_ls())
2
>>> pm.gvtrack_clear()
>>> pm.gvtrack_ls()
[]