Database¶
Functions for initializing, configuring, and managing genomic databases, including directory operations and genome creation.
pymisha.gdb_init ¶
Initialize connection to a misha genomic database.
Loads the genome database at the given path and makes it available for all subsequent genomic operations. Must be called before any other pymisha function that accesses track data.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path to the root directory of the genome database.
TYPE:
|
userpath
|
Path to a user-writable database root. New tracks and interval
sets will be created here. If None, defaults to
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
See Also
gdb_reload : Refresh track lists after external changes. gdb_unload : Disconnect from the database and clear all state. gdb_info : Return metadata about the database. gsetroot : Alternative entry point with directory validation.
Examples:
pymisha.gsetroot ¶
Set the database root directory with validation.
Connects to a genome database after verifying that the directory
exists and contains the required tracks/ and seq/
subdirectories. This matches the R gsetroot() interface and is
the recommended entry point when working interactively, since it
provides clear error messages for invalid database paths.
| PARAMETER | DESCRIPTION |
|---|---|
groot
|
Path to the genome database root directory.
TYPE:
|
subdir
|
Sub-directory within
TYPE:
|
dir
|
Backward-compatible alias for
TYPE:
|
rescan
|
If True, force a rescan of the database after initialization.
Equivalent to calling :func:
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If |
See Also
gdb_init : Lower-level initializer without directory validation. gdb_reload : Refresh track lists without re-initializing.
Examples:
pymisha.gdb_reload ¶
Reload the database, refreshing track lists and metadata.
Re-scans the database root directories for newly created or removed tracks and interval sets. Call this after external modifications to the database on disk (e.g., tracks created by R misha or another process).
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If no database is currently initialized. |
See Also
gdb_init : Initialize a database connection. gdb_unload : Disconnect from the database entirely.
Examples:
pymisha.gdb_unload ¶
Unload the database, clearing all state.
Disconnects from the currently active genome database and resets all
internal state including the database root paths, working directory,
datasets, and virtual tracks. After calling this function, a new
:func:gdb_init call is required before any genomic operations.
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
See Also
gdb_init : Initialize a new database connection. gdb_reload : Refresh without disconnecting.
Examples:
pymisha.gdb_info ¶
Return high-level information about a misha database.
Inspects a genome database directory and returns metadata including the storage format, number of chromosomes, total genome size, and a table of per-chromosome sizes. Can be used to validate a database path without fully initializing a connection.
| PARAMETER | DESCRIPTION |
|---|---|
groot
|
Path to a database root directory. If
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
dict
|
Dictionary with keys:
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
See Also
gdb_init : Initialize a database connection.
Examples:
pymisha.gdb_examples_path ¶
Return the path to the example database if available.
Checks the following locations in order: 1) PYMISHA_EXAMPLES_DB environment variable 2) pymisha/examples/trackdb/test (if packaged) 3) tests/testdb/trackdb/test (repo checkout)
| RETURNS | DESCRIPTION |
|---|---|
str
|
Absolute path to the example database root directory. |
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the example database cannot be located in any of the searched locations. |
See Also
gdb_init_examples : Initialize the example database. gdb_init : Initialize a custom database.
Examples:
pymisha.gdb_init_examples ¶
Initialize the example database (mirrors R's gdb.init_examples).
| PARAMETER | DESCRIPTION |
|---|---|
copy
|
If True, copy the example DB into a temp dir before initializing. This avoids mutating the repo data when running examples.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
str
|
Path to the initialized example DB. |
See Also
gdb_examples_path : Get the path to the example database. gdb_init : Initialize a custom database.
Examples:
pymisha.gdb_create ¶
gdb_create(groot, fasta, genes_file=None, annots_file=None, annots_names=None, db_format='indexed', verbose=False, **kwargs)
Create a new Genomic Database from FASTA file(s).
Creates the directory structure, imports sequences, and writes the chromosome sizes file. Two formats are supported:
"indexed"(default): Singlegenome.seq+genome.idx. Recommended for genomes with many contigs."per-chromosome": Separate.seqfile per contig in theseq/directory.
| PARAMETER | DESCRIPTION |
|---|---|
groot
|
Path for the new database root directory.
TYPE:
|
fasta
|
Path(s) to FASTA file(s). Gzipped files (.fa.gz) are supported.
TYPE:
|
genes_file
|
Path to genes annotation file. Not yet implemented.
TYPE:
|
annots_file
|
Path to annotations file. Not yet implemented.
TYPE:
|
annots_names
|
Names for annotations. Not yet implemented.
TYPE:
|
db_format
|
Database format:
TYPE:
|
format
|
Backward-compatible alias for
TYPE:
|
verbose
|
If True, print progress messages.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
DataFrame
|
DataFrame with columns |
| RAISES | DESCRIPTION |
|---|---|
FileExistsError
|
If the target directory already exists. |
FileNotFoundError
|
If a FASTA file does not exist. |
ValueError
|
If no contigs are found, duplicate contig names are detected, or an unsupported format is specified. |
See Also
gdb_init : Initialize a database connection. gdb_reload : Reload the current database. gdb_create_genome : Download and initialize a prebuilt genome. gdb_convert_to_indexed : Convert per-chromosome format to indexed.
Examples:
Create a database from a single FASTA file:
Create from multiple FASTA files:
Create a per-chromosome database:
pymisha.gdb_create_genome ¶
Download and initialize a prebuilt genome database.
| PARAMETER | DESCRIPTION |
|---|---|
genome
|
Genome identifier. Supported values:
TYPE:
|
path
|
Directory to extract into. Defaults to current working directory.
TYPE:
|
tmpdir
|
Directory to store the temporary downloaded archive. Defaults to
TYPE:
|
verify_checksum
|
If True, download and verify the archive SHA256 checksum from
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If the genome identifier is not supported. |
FileNotFoundError
|
If the downloaded archive does not contain the expected directory. |
See Also
gdb_create : Create a database from local FASTA files. gdb_init : Initialize a database connection.
Examples:
pymisha.gdb_create_linked ¶
Create a linked database that reuses sequence data from a parent DB.
Creates a new DB root with a writable tracks/ directory and symlinks
to the parent's seq/ directory and chrom_sizes.txt file.
| PARAMETER | DESCRIPTION |
|---|---|
path
|
Path for the new linked DB.
TYPE:
|
parent
|
Path to parent DB root.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
bool
|
|
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the parent database directory does not exist or is missing
required files ( |
FileExistsError
|
If the target path already exists. |
See Also
gdb_create : Create a new database from FASTA files. gdataset_load : Load a dataset into the namespace. gdataset_ls : List loaded datasets.
Examples:
pymisha.gdb_convert_to_indexed ¶
gdb_convert_to_indexed(groot=None, remove_old_files=False, force=False, validate=True, convert_tracks=False, convert_intervals=False, verbose=False, chunk_size=104857600)
Convert a per-chromosome database to indexed genome format.
| PARAMETER | DESCRIPTION |
|---|---|
groot
|
Database root. If None, uses currently active DB.
TYPE:
|
remove_old_files
|
If True, remove old per-chromosome
TYPE:
|
force
|
Kept for parity with R API. Ignored in non-interactive Python flow.
TYPE:
|
validate
|
If True, validates converted
TYPE:
|
convert_tracks
|
If True, converts all eligible tracks to indexed format.
TYPE:
|
convert_intervals
|
If True, converts all eligible interval sets to indexed format.
TYPE:
|
verbose
|
If True, prints conversion progress.
TYPE:
|
chunk_size
|
I/O chunk size for reading sequence files.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If |
FileNotFoundError
|
If the database directory, |
See Also
gdb_create : Create a new database from FASTA files. gdb_init : Initialize a database connection.
Examples:
Convert the currently active database to indexed format:
Convert a specific database with full options:
pymisha.gdb_get_readonly_attrs ¶
Return read-only track attributes for the current database.
Returns the list of track attribute names that are protected from
modification or deletion. If no attributes are marked as read-only,
None is returned.
| RETURNS | DESCRIPTION |
|---|---|
list[str] | None
|
List of read-only attribute names, or |
See Also
gdb_set_readonly_attrs : Set the list of read-only attributes.
Examples:
pymisha.gdb_set_readonly_attrs ¶
Set the list of read-only track attributes for the current database.
| PARAMETER | DESCRIPTION |
|---|---|
attrs
|
Attribute names to protect. Pass
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
| RAISES | DESCRIPTION |
|---|---|
ValueError
|
If an attribute name is empty or appears more than once. |
See Also
gdb_get_readonly_attrs : Return the current read-only attributes.
Examples:
pymisha.gdir_cwd ¶
Return the current working directory in the genomic database.
Returns the absolute path of the current working directory in the genomic database. This is not the shell's current working directory but the directory within the misha tracks tree used for resolving track and interval set names.
| RETURNS | DESCRIPTION |
|---|---|
str
|
Absolute path of the current working directory within the database. |
See Also
gdir_cd : Change the current working directory. gdir_create : Create a new directory in the database. gdir_rm : Delete a directory from the database.
Examples:
pymisha.gdir_cd ¶
Change the current working directory in the genomic database.
Changes the directory used for resolving track and interval set names.
The list of database objects (tracks, intervals) is rescanned
recursively under the new directory. Object names are updated relative
to the new working directory. For example, a track named
subdir.dense becomes dense once the working directory is set
to subdir. All virtual tracks are cleared.
| PARAMETER | DESCRIPTION |
|---|---|
dir
|
Directory path (relative to current working directory, or "..").
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
See Also
gdir_cwd : Return the current working directory. gdir_create : Create a new directory in the database. gdir_rm : Delete a directory from the database.
Examples:
pymisha.gdir_create ¶
Create a new directory in the genomic database.
Creates a single directory level under the current working directory.
Only the last element in the specified path is created; recursive
directory creation is not supported. A new directory cannot be created
within an existing .track directory.
| PARAMETER | DESCRIPTION |
|---|---|
dir
|
Directory path relative to the current working directory.
TYPE:
|
show_warnings
|
If True, show warnings (currently unused; kept for R parity).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the parent directory does not exist. |
ValueError
|
If the target is inside a |
See Also
gdir_rm : Delete a directory from the database. gdir_cd : Change the current working directory. gdir_cwd : Return the current working directory.
Examples:
pymisha.gdir_rm ¶
Delete a directory from the genomic database.
If recursive is True, the directory is deleted with all files and
subdirectories it contains. Cannot delete .track directories
directly; use track-removal functions instead.
| PARAMETER | DESCRIPTION |
|---|---|
dir
|
Directory path relative to the current working directory.
TYPE:
|
recursive
|
If True, delete the directory and all its contents.
TYPE:
|
force
|
If True, suppress errors for non-existent directories.
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
| RAISES | DESCRIPTION |
|---|---|
FileNotFoundError
|
If the directory does not exist and |
ValueError
|
If the target is a |
OSError
|
If the directory is not empty and |
See Also
gdir_create : Create a new directory in the database. gdir_cd : Change the current working directory. gdir_cwd : Return the current working directory.
Examples:
pymisha.gtrack_create_dirs ¶
Create the directory hierarchy needed for a dotted track name.
For example, gtrack_create_dirs("proj.sample.my_track") creates
the directories proj and proj/sample under the current
working directory. Use this function with caution -- a long track
name may create a deep directory structure.
| PARAMETER | DESCRIPTION |
|---|---|
track
|
Track name with dot-separated namespace.
TYPE:
|
mode
|
Directory permissions (currently passed to os.mkdir).
TYPE:
|
| RETURNS | DESCRIPTION |
|---|---|
None
|
|
See Also
gdir_create : Create a single directory in the database. gdir_cwd : Return the current working directory.
Examples: