Change Database to Indexed Genome Format

Converts a per-chromosome database to indexed genome format with a single consolidated genome.seq file and genome.idx index. Optionally also converts tracks and interval sets to indexed format.

gdb.convert_to_indexed(
  groot = NULL,
  remove_old_files = FALSE,
  force = FALSE,
  validate = TRUE,
  convert_tracks = FALSE,
  convert_intervals = FALSE,
  verbose = FALSE,
  chunk_size = 104857600
)

Arguments

groot: Root directory of the database to change to indexed format. If NULL, uses the currently active database.
remove_old_files: Logical. If TRUE, removes old per-chromosome files after successful conversion. Default: FALSE.
force: Logical. If TRUE, forces the conversion without confirmation. Default: FALSE.
validate: Logical. If TRUE, validates the conversion by comparing sequences. Default: TRUE.
convert_tracks: Logical. If TRUE, also converts all eligible tracks to indexed format. Default: FALSE.
convert_intervals: Logical. If TRUE, also converts all eligible interval sets to indexed format. Default: FALSE.
verbose: Logical. If TRUE, prints verbose messages. Default: FALSE.
chunk_size: Integer. The size of the chunk to read from the sequence files. Default: 104857600 (100MB). Reduce if you are running into memory issues.

Value

Invisible NULL

Details

This function converts a per-chromosome database (with separate .seq files per contig) to indexed format (single genome.seq + genome.idx). The indexed format provides better performance and scalability, especially for genomes with many contigs.

Important: Preserving Chromosome Order

For exact conversion that produces bit-for-bit identical results before and after conversion, you should load the source database first using gsetroot() or gdb.init():

If database is loaded: Uses chromosome order from ALLGENOME (exact preservation)
If database is not loaded: Uses order from chrom_sizes.txt (may differ from ALLGENOME)

This ensures that the converted database has the exact same chromosome ordering, which affects iteration order, interval IDs, and other operations that depend on chromosome order.

The conversion process:

Checks if database is already in indexed format
Gets chromosome order from ALLGENOME (if loaded) or chrom_sizes.txt
Consolidates all per-chromosome .seq files into genome.seq
Creates genome.idx with CRC64 checksum
Optionally validates the conversion
Optionally removes old .seq files
If convert_tracks=TRUE, converts all eligible 1D tracks (dense, sparse, array)
If convert_intervals=TRUE, converts all eligible interval sets (1D and 2D)

Tracks and intervals that cannot be converted (and are skipped):

Tracks: 2D tracks, virtual tracks, single-file tracks, already converted tracks
Intervals: Single-file interval sets, already converted interval sets

Examples

if (FALSE) { # \dontrun{
# Recommended: Load database first for exact conversion
gsetroot("/path/to/database")
gdb.convert_to_indexed(
    convert_tracks = TRUE,
    convert_intervals = TRUE,
    remove_old_files = TRUE,
    verbose = TRUE
)

# Convert current database to indexed format (genome only)
gdb.convert_to_indexed()

# Convert specific database without loading it first
# Note: chromosome order may differ from ALLGENOME
gdb.convert_to_indexed(groot = "/path/to/database")

# Convert genome and all tracks to indexed format
gdb.convert_to_indexed(convert_tracks = TRUE)

# Full conversion with validation and cleanup
gsetroot("/path/to/database") # Load first for exact order preservation
gdb.convert_to_indexed(
    convert_tracks = TRUE,
    convert_intervals = TRUE,
    remove_old_files = TRUE,
    validate = TRUE,
    verbose = TRUE
)
} # }

Arguments

Value

Details

See also

Examples