Skip to content

Data Operations

Functions for extracting, summarizing, and analyzing track data across genomic intervals, including statistical summaries, distributions, correlations, and segmentation.

pymisha.gextract

gextract(expr, intervals=None, iterator=None, colnames=None, band=None, vars=None, **kwargs)

Return evaluated track expression values for each iterator interval.

For each interval in the iterator, evaluates one or more track expressions and returns the results as a DataFrame with interval coordinates and expression values. An intervalID column maps each output row back to the input interval.

If input intervals overlap, overlapped coordinates appear multiple times. The order of results may differ from input interval order; use intervalID to match rows to original intervals.

PARAMETER DESCRIPTION
expr

One or more track expressions to evaluate.

TYPE: str or list of str

intervals

Genomic scope (chrom/start/end DataFrame or intervals set name). If None, uses ALLGENOME. For 2D tracks, pass 2D intervals (with chrom1/start1/end1/chrom2/start2/end2 columns).

TYPE: DataFrame or str DEFAULT: None

colnames

Column names for expression values. Must match the number of expressions. If None, uses expression strings.

TYPE: list of str DEFAULT: None

iterator

Track expression iterator. If None, determined from expressions. For multi-expression 2D extraction, pass an explicit iterator.

TYPE: int or str DEFAULT: None

band

Diagonal band for 2D track extraction as (d1, d2). Only applicable with 2D intervals.

TYPE: tuple of (int, int) DEFAULT: None

vars

Explicit variable bindings for the expression. When provided, these are used instead of auto-capturing the caller's namespace.

TYPE: dict DEFAULT: None

**kwargs

Additional keyword arguments:

  • file (str, optional) -- Path to write extraction results as a tab-separated file. When provided, the result is written to the file and None is returned instead of a DataFrame.
  • intervals_set_out (str, optional) -- Name of an interval set to save the result coordinate columns to. The interval set is created via :func:gintervals_save.
  • progress (bool or str, optional) -- Whether to show a progress bar.
  • progress_desc (str, optional) -- Description for the progress bar (default 'gextract').

DEFAULT: {}

RETURNS DESCRIPTION
DataFrame or None

DataFrame with columns: chrom, start, end, , ..., intervalID. Returns None if the iterator produces no intervals, or if file is specified.

See Also

gsummary : Summarize track expression over intervals. gquantiles : Compute quantiles of track expression over intervals. gdist : Compute distribution of track expression over intervals. glookup : Look up track values at specific positions. gscreen : Find intervals where a logical expression is True.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> result = pm.gextract("dense_track", intervals=pm.gintervals("1", 0, 1000),
...                      iterator=200, progress=False)
>>> result.columns.tolist()
['chrom', 'start', 'end', 'dense_track', 'intervalID']
>>> len(result)
5

pymisha.gscreen

gscreen(expr, intervals=None, vars=None, **kwargs)

Find intervals where a logical track expression is True.

Evaluates a logical track expression and returns all intervals where the expression value is True (non-zero). Adjacent True intervals on the same chromosome are merged into a single interval.

PARAMETER DESCRIPTION
expr

Logical track expression.

TYPE: str

intervals

Genomic scope (chrom/start/end DataFrame or intervals set name). If None, uses ALLGENOME.

TYPE: DataFrame or str DEFAULT: None

vars

Explicit variable bindings for the expression.

TYPE: dict DEFAULT: None

**kwargs

Additional keyword arguments:

  • iterator (int or str, optional) -- Track expression iterator. If None, determined from expression.
  • progress (bool or str, optional) -- Whether to show a progress bar.
  • progress_desc (str, optional) -- Description for the progress bar (default 'gscreen').

DEFAULT: {}

RETURNS DESCRIPTION
DataFrame or None

DataFrame with columns: chrom, start, end. Returns None if no intervals match the expression.

See Also

gextract : Extract track expression values for each interval. gsegment : Segment genome by track expression values.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> result = pm.gscreen("dense_track > 0.2", intervals=pm.gintervals("1", 0, 10000),
...                     progress=False)
>>> "chrom" in result.columns
True

pymisha.gsummary

gsummary(expr, intervals=None, iterator=None, vars=None, **kwargs)

Calculate summary statistics of a track expression.

Returns summary statistics: total bins, NaN count, min, max, sum, mean, and standard deviation of the values.

PARAMETER DESCRIPTION
expr

Track expression.

TYPE: str

intervals

Genomic scope. If None, uses ALLGENOME.

TYPE: DataFrame or str DEFAULT: None

iterator

Track expression iterator. If None, determined from expression.

TYPE: int or str DEFAULT: None

vars

Explicit variable bindings for the expression.

TYPE: dict DEFAULT: None

band

Diagonal band for 2D tracks as (d1, d2).

TYPE: tuple of (int, int)

RETURNS DESCRIPTION
Series

Series with index: ["Total intervals", "NaN intervals", "Min", "Max", "Sum", "Mean", "Std dev"].

See Also

gintervals_summary, gbins_summary, gquantiles

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gsummary("dense_track")

pymisha.gquantiles

gquantiles(expr, percentiles=0.5, intervals=None, iterator=None, vars=None, **kwargs)

Calculate quantiles of a track expression.

Computes quantiles for the given percentiles. If data size exceeds the configured limit, data is randomly sampled to fit.

PARAMETER DESCRIPTION
expr

Track expression.

TYPE: str

percentiles

Percentiles in [0, 1] range.

TYPE: array - like DEFAULT: 0.5

intervals

Genomic scope. If None, uses ALLGENOME.

TYPE: DataFrame or str DEFAULT: None

iterator

Track expression iterator. If None, determined from expression.

TYPE: int or str DEFAULT: None

vars

Explicit variable bindings for the expression.

TYPE: dict DEFAULT: None

band

Diagonal band for 2D tracks.

TYPE: tuple of (int, int)

RETURNS DESCRIPTION
ndarray

Array of quantile values corresponding to the given percentiles.

See Also

gintervals_quantiles, gbins_quantiles, gdist

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gquantiles("dense_track", [0.25, 0.5, 0.75])

pymisha.gdist

gdist(*args, intervals=None, include_lowest=False, iterator=None, band=None, dataframe=False, names=None, vars=None, **kwargs)

Calculate distribution of track expressions over bins.

This function calculates the distribution of values of numeric track expressions over the given set of bins using a memory-efficient C++ streaming implementation.

PARAMETER DESCRIPTION
*args

Alternating track expressions and their bin breaks. Example: gdist("track1", [0, 0.5, 1], "track2", [0, 1, 2])

TYPE: pairs of (expr, breaks) DEFAULT: ()

intervals

Genomic scope for which the function is applied. If None, uses all genomic intervals.

TYPE: DataFrame DEFAULT: None

include_lowest

If True, the lowest value will be included in the first bin. Example: breaks=[0, 0.2, 0.5] creates (0, 0.2], (0.2, 0.5]. With include_lowest=True: [0, 0.2], (0.2, 0.5].

TYPE: bool DEFAULT: False

iterator

Track expression iterator. If None, determined implicitly.

TYPE: int or str DEFAULT: None

band

Track expression band (not yet supported).

TYPE: optional DEFAULT: None

dataframe

If True, return a DataFrame instead of an N-dimensional array.

TYPE: bool DEFAULT: False

names

Column names for the expressions in the returned DataFrame (only relevant when dataframe=True).

TYPE: list of str DEFAULT: None

vars

Explicit variable bindings for the expression.

TYPE: dict DEFAULT: None

RETURNS DESCRIPTION
ndarray or DataFrame

If dataframe=False: N-dimensional array where N is the number of expr-breaks pairs. The shape is (n_bins_1, n_bins_2, ..., n_bins_N). If dataframe=True: DataFrame with columns for each track expression (bin labels) and an 'n' column with counts.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

Calculate distribution of dense_track for bins (0, 0.2], (0.2, 0.5], (0.5, 1]:

>>> pm.gdist("dense_track", [0, 0.2, 0.5, 1])

Calculate 2D distribution - dense_track vs sparse_track:

>>> pm.gdist("dense_track", [0, 0.5, 1], "sparse_track", [0, 1, 2],
...          iterator=100)

Get result as DataFrame:

>>> pm.gdist("dense_track", [0, 0.2, 0.5, 1], dataframe=True)
See Also

gsummary, gquantiles, gpartition

pymisha.gpartition

gpartition(expr, breaks, intervals=None, include_lowest=False, iterator=None, **kwargs)

Partition track expression values into bins and return corresponding intervals.

Converts track expression values to 1-based bin indices according to 'breaks', then returns the intervals with their corresponding bin index. Adjacent intervals with the same bin value are merged.

The range of bins is determined by 'breaks' argument. For example: breaks=[x1, x2, x3, x4] represents three bins: (x1, x2], (x2, x3], (x3, x4].

If 'include_lowest' is True, the lowest value is included in the first bin: [x1, x2], (x2, x3], (x3, x4].

PARAMETER DESCRIPTION
expr

Track expression to evaluate.

TYPE: str

breaks

Break points that determine the bins. Must have at least 2 elements and be strictly increasing.

TYPE: array - like

intervals

Genomic scope for which the function is applied. If None, uses all genomic intervals.

TYPE: DataFrame DEFAULT: None

include_lowest

If True, the lowest value of the range is included in the first bin.

TYPE: bool DEFAULT: False

iterator

Track expression iterator. If None, determined implicitly.

TYPE: int or str DEFAULT: None

band

Track expression band (not yet supported).

TYPE: optional

RETURNS DESCRIPTION
DataFrame or None

DataFrame with columns 'chrom', 'start', 'end', 'bin' where 'bin' is the 1-based bin index. Returns None if no values fall within the breaks. Adjacent intervals with the same bin are merged.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()

Partition dense_track values into 4 bins:

>>> breaks = [0, 0.05, 0.1, 0.15, 0.2]
>>> pm.gpartition("dense_track", breaks, pm.gintervals("1", 0, 5000))
See Also

gdist, gsummary, gextract

Notes

Values outside the break range are excluded from the result. NaN values are also excluded.

pymisha.gsample

gsample(expr, n, intervals=None, iterator=None)

Sample values from a track expression using reservoir sampling.

Randomly samples n values from the track expression over the given intervals. The sampling is performed in a single streaming pass using a reservoir sampler, so it is memory-efficient regardless of the number of genomic positions scanned.

PARAMETER DESCRIPTION
expr

Track expression.

TYPE: str

n

Number of samples to draw.

TYPE: int

intervals

Genomic scope. If None, uses gintervals_all().

TYPE: DataFrame DEFAULT: None

iterator

Iterator policy for binning the intervals.

TYPE: int or str DEFAULT: None

RETURNS DESCRIPTION
ndarray

1-D array of sampled values (float64). Length may be less than n if fewer non-NaN data points exist.

RAISES DESCRIPTION
ValueError

If n < 1.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> samples = pm.gsample("dense_track", 100)
>>> len(samples)
100
>>> samples = pm.gsample("dense_track", 50,
...                      pm.gintervals(1, 0, 10000))
>>> len(samples)
50
See Also

gextract, gsummary, gquantiles

pymisha.gcor

gcor(*exprs, intervals=None, iterator=None, method='pearson', details=False, names=None)

Compute correlation between pairs of track expressions.

Calculates correlation in a single streaming pass over the data, making it memory-efficient for genome-wide computations. Supports multitasking via chromosome partitioning when enabled.

PARAMETER DESCRIPTION
*exprs

An even number of track expressions. Each consecutive pair (expr1, expr2) defines one correlation to compute.

TYPE: str DEFAULT: ()

intervals

Genomic scope. If None, uses gintervals_all().

TYPE: DataFrame DEFAULT: None

iterator

Iterator policy.

TYPE: int or str DEFAULT: None

method

Correlation method. "pearson" computes Pearson correlation in a streaming pass. "spearman" computes an approximate, memory-bounded Spearman correlation using reservoir sampling. "spearman.exact" computes exact Spearman correlation with average-rank ties (requires O(n) memory in number of non-NaN bins).

TYPE: (pearson, spearman, exact) DEFAULT: "pearson"

details

If True, return a DataFrame with full statistics (cor, cov, mean1, mean2, sd1, sd2, n, n.na) for Pearson, or (n, n.na, cor) for Spearman methods, instead of just correlation values.

TYPE: bool DEFAULT: False

names

Names for each correlation pair. If None, names are generated as "expr1~expr2".

TYPE: list of str DEFAULT: None

RETURNS DESCRIPTION
ndarray or DataFrame

If details=False: 1-D array of correlation values, one per pair. If details=True: DataFrame with rows per pair and columns for all statistics.

RAISES DESCRIPTION
ValueError

If an odd number of expressions is provided.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gcor("dense_track", "sparse_track",
...         intervals=pm.gintervals(1, 0, 10000), iterator=1000)
array([...])
>>> pm.gcor("dense_track", "sparse_track",
...         intervals=pm.gintervals(1, 0, 10000), iterator=1000,
...         details=True)
          cor       cov     mean1  ...
0  ...
>>> pm.gcor("dense_track", "sparse_track",
...         intervals=pm.gintervals(1, 0, 10000), iterator=1000,
...         method="spearman")
array([...])
>>> pm.gcor("dense_track", "sparse_track",
...         intervals=pm.gintervals(1, 0, 10000), iterator=1000,
...         method="spearman.exact", details=True)
          n  n.na       cor
0  ...
See Also

gsummary, gextract

pymisha.gbins_summary

gbins_summary(*args, expr=None, intervals=None, include_lowest=False, iterator=None, band=None, **kwargs)

Compute summary statistics per bin.

PARAMETER DESCRIPTION
*args

Alternating track expressions and bin break vectors.

TYPE: pairs of (bin_expr, breaks) DEFAULT: ()

expr

Track expression to summarize. If None the first bin expression is used.

TYPE: str DEFAULT: None

intervals

Genomic scope. Defaults to all intervals.

TYPE: DataFrame DEFAULT: None

include_lowest

Include the left edge of the first bin.

TYPE: bool DEFAULT: False

iterator

Track expression iterator.

TYPE: int or str DEFAULT: None

band

Diagonal band (d1, d2) for 2D interval filtering.

TYPE: tuple of (float, float) DEFAULT: None

RETURNS DESCRIPTION
ndarray

Shape (*n_bins, 7) where the last axis holds: [Total intervals, NaN intervals, Min, Max, Sum, Mean, Std dev].

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gbins_summary("dense_track", [0, 0.2, 0.4, 2], expr="sparse_track",
...                  intervals=pm.gintervals(1), iterator="dense_track")
See Also

gsummary, gintervals_summary, gdist

pymisha.gbins_quantiles

gbins_quantiles(*args, expr=None, percentiles=0.5, intervals=None, include_lowest=False, iterator=None, band=None, **kwargs)

Compute quantiles per bin.

PARAMETER DESCRIPTION
*args

Alternating track expressions and bin break vectors.

TYPE: pairs of (bin_expr, breaks) DEFAULT: ()

expr

Track expression to compute quantiles for. If None the first bin expression is used.

TYPE: str DEFAULT: None

percentiles

Percentile(s) in [0, 1].

TYPE: float or array - like DEFAULT: 0.5

intervals

Genomic scope. Defaults to all intervals.

TYPE: DataFrame DEFAULT: None

include_lowest

Include the left edge of the first bin.

TYPE: bool DEFAULT: False

iterator

Track expression iterator.

TYPE: int or str DEFAULT: None

band

Diagonal band (d1, d2) for 2D interval filtering.

TYPE: tuple of (float, float) DEFAULT: None

RETURNS DESCRIPTION
ndarray

Shape (*n_bins, n_percentiles).

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> pm.gbins_quantiles("dense_track", [0, 0.2, 0.4, 2],
...                    expr="sparse_track", percentiles=[0.2, 0.5],
...                    intervals=pm.gintervals(1), iterator="dense_track")
See Also

gquantiles, gintervals_quantiles, gdist

pymisha.gintervals_summary

gintervals_summary(expr, intervals, iterator=None, **kwargs)

Compute summary statistics for each interval.

PARAMETER DESCRIPTION
intervals_set_out

When provided, saves the resulting intervals set via gintervals_save and returns None.

TYPE: str

RETURNS DESCRIPTION
DataFrame or None

DataFrame with the original interval columns (chrom, start, end) plus summary statistic columns: Total intervals, NaN intervals, Min, Max, Sum, Mean, Std dev. Returns None if intervals_set_out is provided (result is saved to disk instead).

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs = pm.gintervals([1, 2], 0, 5000)
>>> pm.gintervals_summary("dense_track", intervs)
See Also

gsummary, gbins_summary

pymisha.gintervals_quantiles

gintervals_quantiles(expr, percentiles=0.5, intervals=None, iterator=None, **kwargs)

Compute quantiles for each interval.

PARAMETER DESCRIPTION
intervals_set_out

When provided, saves the resulting intervals set via gintervals_save and returns None.

TYPE: str

RETURNS DESCRIPTION
DataFrame or None

DataFrame with the original interval columns (chrom, start, end) plus one column per requested percentile, named by the percentile value (e.g., "0.5"). Returns None if intervals_set_out is provided (result is saved to disk instead).

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> intervs = pm.gintervals([1, 2], 0, 5000)
>>> pm.gintervals_quantiles("dense_track", percentiles=[0.5, 0.3, 0.9],
...                         intervals=intervs)
See Also

gquantiles, gbins_quantiles

pymisha.gcis_decay

gcis_decay(expr, breaks, src, domain, intervals=None, include_lowest=False, iterator=None, band=None)

Calculate distribution of cis contact distances.

For contacts where chrom1 equals chrom2 and the first interval (I1) is fully within src intervals, this function bins the distance between I1 and I2 separately for intra-domain and inter-domain contacts.

A contact is intra-domain when both I1 and I2 are fully contained within the same domain interval. Otherwise it is inter-domain.

The distance is abs((start1 + end1 - start2 - end2) / 2) (integer division), i.e. the absolute difference of the interval midpoints.

PARAMETER DESCRIPTION
expr

A 2D track expression (must be a simple 2D track name).

TYPE: str

breaks

Sorted break points defining distance bins. Example: breaks=[x1, x2, x3] creates bins (x1, x2] and (x2, x3].

TYPE: array_like

src

Source intervals (chrom, start, end). Only contacts whose I1 is fully within the unified source intervals are counted. Overlapping source intervals are allowed and will be merged.

TYPE: DataFrame

domain

Domain intervals (chrom, start, end). Must be non-overlapping. Used to classify contacts as intra- or inter-domain.

TYPE: DataFrame

intervals

Genomic scope (1D intervals). Defaults to all genome intervals. Only cis contacts (chrom1 == chrom2) within these chromosomes are considered.

TYPE: DataFrame DEFAULT: None

include_lowest

If True, the lowest break value is included in the first bin: [x1, x2] instead of (x1, x2].

TYPE: bool DEFAULT: False

iterator

2D iterator specification. Currently unused (extraction uses the track's native resolution).

TYPE: str DEFAULT: None

band

Diagonal band filter (d1, d2). Only contacts where the diagonal offset falls within the band are considered.

TYPE: tuple of (int, int) DEFAULT: None

RETURNS DESCRIPTION
ndarray

2D array of shape (n_bins, 2) where column 0 is intra-domain counts and column 1 is inter-domain counts. Row and column labels are stored as a breaks attribute on the array.

See Also

gdist : General distribution of track expressions. gextract : Extract track values over intervals.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> import pandas as pd
>>> src = pd.DataFrame({"chrom": ["1", "1"], "start": [0, 200000], "end": [100000, 400000]})
>>> domain = pd.DataFrame({"chrom": ["1"], "start": [0], "end": [500000]})
>>> breaks = [0, 100000, 200000, 300000, 400000, 500000]
>>> result = pm.gcis_decay("rects_track", breaks, src, domain)
>>> result.shape[1]
2

pymisha.gsegment

gsegment(expr, minsegment, maxpval=0.05, onetailed=True, intervals=None, iterator=None, intervals_set_out=None)

Divide track expression into segments using Wilcoxon test.

Divides the values of a track expression into segments, where each segment size is at least minsegment and the P-value of comparing the segment with the first minsegment values from the next segment is at most maxpval. Comparison is done using the Wilcoxon (Mann-Whitney) test.

PARAMETER DESCRIPTION
expr

Track expression.

TYPE: str

minsegment

Minimal segment size in base pairs.

TYPE: int

maxpval

Maximal P-value that separates two adjacent segments. Default 0.05.

TYPE: float DEFAULT: 0.05

onetailed

If True, Wilcoxon test is one-tailed. Default True.

TYPE: bool DEFAULT: True

intervals

Genomic scope. Defaults to all genome intervals.

TYPE: DataFrame DEFAULT: None

iterator

Fixed bin iterator size. If None, determined from track expression.

TYPE: int DEFAULT: None

intervals_set_out

If provided, save result as an intervals set and return None.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
DataFrame or None

Intervals where each row represents a segment (chrom, start, end). Returns None if intervals_set_out is provided, or if input is empty.

See Also

gwilcox : Sliding-window Wilcoxon test.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> result = pm.gsegment("dense_track", 5000, maxpval=0.0001)
>>> result.columns.tolist()
['chrom', 'start', 'end']

pymisha.gwilcox

gwilcox(expr, winsize1, winsize2, maxpval=0.05, onetailed=True, what2find=1, intervals=None, iterator=None, intervals_set_out=None)

Sliding-window Wilcoxon test over track expression values.

Runs a Wilcoxon test (Mann-Whitney) over the values of a track expression in two sliding windows with an identical center. Returns intervals where the smaller window tested against the larger window gives a P-value below maxpval.

PARAMETER DESCRIPTION
expr

Track expression.

TYPE: str

winsize1

Size of the first sliding window in base pairs.

TYPE: int

winsize2

Size of the second sliding window in base pairs.

TYPE: int

maxpval

Maximal P-value threshold. Default 0.05.

TYPE: float DEFAULT: 0.05

onetailed

If True, Wilcoxon test is one-tailed. Default True.

TYPE: bool DEFAULT: True

what2find

-1 for lows, 1 for peaks, 0 for both. Default 1.

TYPE: int DEFAULT: 1

intervals

Genomic scope. Defaults to all genome intervals.

TYPE: DataFrame DEFAULT: None

iterator

Fixed bin iterator size. If None, determined from track expression.

TYPE: int DEFAULT: None

intervals_set_out

If provided, save result as an intervals set and return None.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
DataFrame or None

Intervals with pval column where P-value is below maxpval. Returns None if no significant regions found, input is empty, or intervals_set_out is provided.

See Also

gsegment : Divide track expression into segments using Wilcoxon test.

Examples:

>>> import pymisha as pm
>>> _ = pm.gdb_init_examples()
>>> result = pm.gwilcox("dense_track", 100000, 1000, maxpval=0.01, what2find=1)
>>> result is None or "chrom" in result.columns
True

pymisha.glookup

glookup(lookup_table, *args, intervals=None, include_lowest=False, force_binning=True, iterator=None, band=None, **kwargs)

Look up values in an N-dimensional lookup table indexed by track expressions.

For each iterator interval, evaluates one or more track expressions and uses the resulting values to index into a lookup table. Returns the table value for each interval.

Uses a memory-efficient C++ streaming implementation when expressions do not contain virtual tracks. Falls back to Python (memory-resident) when virtual tracks, a band filter, or 2D intervals are present.

PARAMETER DESCRIPTION
lookup_table

N-dimensional lookup table. The shape must match the number of bins in each dimension. For 1D lookup, shape is (n_bins,). For multi-dimensional lookup, shape is (n_bins_1, n_bins_2, ...).

TYPE: ndarray

*args

Alternating track expressions and break arrays defining bins. Same format as gdist. Example: glookup(table, "track1", breaks1, "track2", breaks2, ...).

TYPE: pairs of (str, array-like) DEFAULT: ()

intervals

Genomic scope for which the function is applied. Required.

TYPE: DataFrame or str DEFAULT: None

include_lowest

If True, the lowest break value is included in the first bin. Example: breaks=[0, 0.2, 0.5] creates (0, 0.2], (0.2, 0.5]. With include_lowest=True: [0, 0.2], (0.2, 0.5].

TYPE: bool DEFAULT: False

force_binning

If True, clamp out-of-range values to the nearest bin instead of NaN. If False, out-of-range values produce NaN.

TYPE: bool DEFAULT: True

iterator

Track expression iterator. If None, determined implicitly.

TYPE: int or str DEFAULT: None

band

Diagonal band for 2D tracks. Triggers Python fallback path.

TYPE: tuple of (int, int) DEFAULT: None

RETURNS DESCRIPTION
DataFrame or None

Intervals with columns: chrom, start, end, value, intervalID. Returns None if intervals is empty.

See Also

gdist : Compute distribution over binned track expressions. gtrack_lookup : Create a track from an N-dimensional lookup table.

Examples:

>>> import pymisha as pm
>>> import numpy as np
>>> _ = pm.gdb_init_examples()

One-dimensional lookup:

>>> breaks = [0.1, 0.12, 0.14, 0.16, 0.18, 0.2]
>>> result = pm.glookup([10, 20, 30, 40, 50], "dense_track", breaks,
...                     intervals=pm.gintervals("1", 0, 500))
>>> "value" in result.columns
True

pymisha.gtrack_lookup

gtrack_lookup(track, description, lookup_table, *args, iterator=None, include_lowest=False, force_binning=True, band=None)

Create a track from an N-dimensional lookup table.

Evaluates track expressions genome-wide, looks up values in the table, and creates a new track from the results. Dense or sparse output is determined by the iterator type.

This is the track-creation counterpart of :func:glookup, which returns values in-memory without creating a track.

PARAMETER DESCRIPTION
track

Name for the new track.

TYPE: str

description

Track description.

TYPE: str

lookup_table

N-dimensional lookup table. Shape must match the number of bins defined by each (expr, breaks) pair. For 1D: (n_bins,). For multi-dimensional: (n_bins_1, n_bins_2, ...).

TYPE: ndarray

*args

Alternating track expressions and break arrays defining bins.

TYPE: pairs of (str, array-like) DEFAULT: ()

iterator

Track expression iterator. Integer values create dense tracks with that bin size. Intervals-based iterators create sparse tracks.

TYPE: int or str DEFAULT: None

include_lowest

If True, the lowest break value is included in the first bin.

TYPE: bool DEFAULT: False

force_binning

If True, clamp out-of-range values to the nearest bin. If False, out-of-range values produce NaN in the track.

TYPE: bool DEFAULT: True

band

Diagonal band for 2D tracks. Passed through to :func:glookup.

TYPE: tuple of (int, int) DEFAULT: None

RETURNS DESCRIPTION
None
See Also

glookup : Look up values without creating a track. gtrack_create : Create a track from an expression. gdist : Compute distribution over binned track expressions.

Examples:

>>> import pymisha as pm
>>> import numpy as np
>>> _ = pm.gdb_init_examples()

Create a dense track from 1D lookup:

>>> pm.gtrack_lookup("my_track", "lookup track",
...     np.array([10.0, 20.0, 30.0, 40.0]),
...     "dense_track", [0, 0.05, 0.1, 0.15, 0.2],
...     iterator=100)