NEWS.md
genome.seq + genome.idx files instead of per-chromosome filesgdb.info(), gdb.convert_to_indexed(), gtrack.convert_to_indexed(), gintervals.convert_to_indexed(), gintervals.2d.convert_to_indexed()
options(gmulticontig.indexed_format = FALSE) to create databases in legacy format for compatibility with older misha versionsvignette("Database-Formats") for more details.gmax.processes automatically set to 70% of available CPU coresgmax.data.size coordinated with process limits to ensure total memory usage <= 70% of RAM (capped at 10GB per process)gmax.data.size = min((RAM * 0.7) / gmax.processes, 10GB) ensures safe memory usage across all parallel processesgmax.processes * 1000 records (e.g., 2K on laptops, 89K on 128-core servers)options()
vignette("Manual") for detailsgvtrack.create with src parameter). These tracks behave exactly like regular sparse tracks, but are stored in memory and can be used in track expressions.sshift, eshift and filter parameters to gvtrack.create.gintervals.path() and gtrack.path() functions that return the actual file system paths for interval sets and tracks.masked.count and masked.frac virtual track functions that count and fraction masked base pairs (lowercase letters) in the current iterator interval.gtrack.liftover did not fill chromosomes missing the chain with NA values. This caused errors when trying to access the tracks afterwards.gintervals.as_chain function that converts a data frame to a chain object.gintervals.liftover via value_col and multi_target_agg parameters.src_overlap_policy and tgt_overlap_policy parameters to gintervals.liftover, gintervals.load_chain, and gtrack.liftover functions.gtrack.liftover via multi_target_agg parameter.gintervals.load_chain now returns valid misha intervals instead of a chain object.gintervals.load_chain now includes score and chain_id columns for all loaded chainsmin_score parameter in gintervals.load_chain, gintervals.liftover, and gtrack.liftover filters out low-quality chainstgt_overlap_policy = "auto_score" (or "auto") selects the best chain mapping based on alignment score (highest score → longest span → lowest chain_id)include_metadata parameter in gintervals.liftover optionally returns score and chain_id for each mapping BREAKING: “auto” is now an alias for “auto_score”. For the old behavior, use tgt_overlap_policy = "auto_first".canonic parameter to gintervals.liftover (default FALSE) to merge adjacent target intervals resulting from the same source interval and chain.tgt_overlap_policy = "best_cluster_union" (default, aliased as "best_source_cluster"): Uses source union coveragetgt_overlap_policy = "best_cluster_sum": Uses sum of target lengthstgt_overlap_policy = "best_cluster_max": Uses longest single membermax.pos.abs, max.pos.relative, min.pos.abs, min.pos.relative: Returns the position of the maximum/minimum value in the iterator intervalexists: Returns 1 if any value exists (or specific vals if provided), 0 otherwisesize: Returns the number of non-NaN values in the iterator intervalsample: Returns a uniformly sampled source value from the iterator intervalsample.pos.abs and sample.pos.relative: Returns the position of a uniformly sampled valuefirst and last: Returns the first/last value in the iterator intervalfirst.pos.abs, first.pos.relative, last.pos.abs, last.pos.relative: Returns the position of the first/last valuegintervals.neighbors when using mindist=0, maxdist=0: the function would miss zero-distance (touching) intervals when using mindist=0, maxdist=0.pwm.count with spatial sliding windows double-counting bidirectional hits (forward + reverse) at the same genomic position; the sliding path now matches the baseline per-position union semantics.gintervals.load_chain now returns a data frame with 8 columns instead of 7. Columns are: chrom, start, end, strand, chromsrc, startsrc, endsrc, strandsrc.src_overlap_policy and tgt_overlap_policy parameters to gintervals.load_chain, gtrack.liftover and gintervals.liftover functions.neighbor.count virtual track.gintervals.mark_overlaps function that marks overlapping intervals with a group ID.pssm parameter of gvtrack.create and gseq.pwm functions.gseq.pwm and added neutral_chars_policy parameter.pwm, pwm.max and pwm.count) for dense iterators when spatial weighting is disabled, providing significant performance improvements for consecutive genomic intervals.pwm.count(bidirect=TRUE) now counts per-position union of strands (via log-sum-exp), aligning with pwm/pwm.max. Each position contributes at most 1 to the count. To reproduce the old per-strand-sum behavior, add the two strand-specific counts: pwm.count(bidirect=FALSE, strand=1) + pwm.count(bidirect=FALSE, strand=-1).gseq.pwm and gseq.kmer functions that compute pwm and kmer scores on sequences without the need for a genome database.gseq.rev and gseq.comp functions that reverse and complement DNA sequences without the need for a genome database.gseq.revcomp alias for grevcomp function.gintervals.random function that generates random genome intervals.gintervals.covered_bp and gintervals.coverage_fraction functions that calculate the number of base pairs and the fraction of base pairs covered by a set of intervals.pwm.count virtual track function that counts the number of occurrences of a PWM in the current iterator interval.gintervals.neighbors.upstream() - Find upstream neighbors relative to query strandgintervals.neighbors.downstream() - Find downstream neighbors relative to query strandgintervals.neighbors.directional() - Find both upstream and downstream neighborsuse_intervals1_strand parameter to gintervals.neighbors() to use query intervals’ strand for distance directionality.warn.ignored.strand parameter to gintervals.neighbors() to control warnings when query strand is ignored.gintervals.neighbors: a stack imbalance in the C++ code in very rare cases of 2D intervals.gintervals.neighbors due to unbalanced rprotect calls.gintervals.normalize and gintervals.annotate functions.m1-asan build.pwm and kmer virtual track functions: iterator shifts were not applied.colnames parameter to gintervals.mapply function.attrs parameter to gtrack.import function.created.user default attribute in track creation functions.gtrack.import function.gtrack.create_dense function - creates a dense track from an intervals and values.clock_gettime is missing).gtrack.import_bigwig: intern argument was not passed to system calls.grevcomp function (reverse complement of a DNA sequence).gdb.create_genome function.R_curErrorBuf, SET_TYPEOF
Rf_ prefix in the c++ code.ALLGENOME is now only soft deprecated in order to support old misha scripts.gtrack.create_dirs function.gcluster.run.gintervals.neighbors..misha. Variables such as ALLGENOME can now be accessed as .misha$ALLGENOME. This change is not backwards compatible, please update your code accordingly.gintervals.neighbors (same as gintervals.neighbors1 from misha.ext). This means that instead of having two columns of ‘chrom’, ‘start’ and ‘end’, the resulting data frame would have ‘chrom1’, ‘start1’ and ‘end1’.gwget now uses curl in order to work on systems that do not have ftp installed.markdown format.Genomes vignette that demonstrates how to create a new genome database.