Datasets¶
Functions for loading, saving, and managing named datasets within the genomic database.
pymisha.gdataset_load ¶
Load a dataset into the namespace.
Loads tracks and intervals from a dataset directory, making them
available for analysis alongside the working database. If the dataset
contains tracks or intervals whose names collide with objects in the
working database or previously loaded datasets, an error is raised
unless force=True. When collisions are forced, the working
database always wins; for dataset-to-dataset collisions, the
later-loaded dataset overrides earlier ones.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path to a dataset or misha database directory.
TYPE:
|
force
|
If True, ignore name collisions (working db wins; later datasets override earlier).
TYPE:
|
verbose
|
Print loaded track/interval counts.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with keys |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the dataset path does not exist, lacks a |
See Also
gdataset_unload : Unload a dataset from the namespace. gdataset_save : Save tracks/intervals as a dataset. gdataset_ls : List loaded datasets.
Examples:
pymisha.gdataset_unload ¶
Unload a dataset from the namespace.
Removes all tracks and intervals from a previously loaded dataset. If a dataset track was shadowing another, the shadowed track becomes visible again.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path to a previously loaded dataset.
TYPE:
|
validate
|
If True, raise an error if the path is not currently loaded. Otherwise silently no-op.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
See Also
gdataset_load : Load a dataset into the namespace. gdataset_ls : List loaded datasets.
Examples:
pymisha.gdataset_ls ¶
List currently loaded datasets.
Returns the normalized absolute paths of all datasets that have been
loaded into the current session via gdataset_load.
| RETURNS | DESCRIPTION |
|---|---|
list[str]
|
Normalized absolute paths of loaded datasets. Empty list if no datasets are loaded. |
See Also
gdataset_load : Load a dataset into the namespace. gdataset_info : Return metadata for a dataset.
Examples:
pymisha.gdataset_save ¶
gdataset_save(path: str, description: str, tracks: str | Iterable[str] | None = None, intervals: str | Iterable[str] | None = None, symlinks: bool = False, copy_seq: bool = False) -> str
Save selected tracks/intervals into a standalone dataset directory.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Destination directory. Must not exist.
TYPE:
|
description
|
Dataset description stored in
TYPE:
|
tracks
|
Track names to include.
TYPE:
|
intervals
|
Interval set names to include.
TYPE:
|
symlinks
|
If True, link track/interval resources instead of copying.
TYPE:
|
copy_seq
|
If True, copy
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Absolute path of the created dataset directory. |
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If neither |
See Also
gdataset_load : Load a dataset into the namespace. gdataset_info : Return metadata for a dataset.
Examples:
pymisha.gdataset_info ¶
Return metadata and contents summary for a dataset path.
Reads the misha.yaml metadata file and scans the dataset for
tracks and intervals. The dataset does not need to be loaded.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path to a dataset directory (loaded or not).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict[str, Any]
|
Dictionary with keys: |
See Also
gdataset_ls : List loaded datasets. gdataset_load : Load a dataset into the namespace.
Examples: