Replaces specified intervals in a reference genome with donor DNA sequences and writes the result as a new FASTA file. Optionally creates a misha trackdb from the output.
ggenome.implant(
intervals,
donor,
output,
genome_fasta = NULL,
create_trackdb = TRUE,
trackdb_path = NULL,
line_width = 80L,
overwrite = FALSE
)A data.frame with chrom, start, end
columns specifying the regions to replace. Coordinates are 0-based,
half-open (standard misha convention).
Either a character vector of DNA sequences (one per interval row), or a single string path to a misha database root from which sequences will be extracted at the same intervals.
Path for the output FASTA file.
Path to the reference FASTA file to edit. If NULL,
the current misha database is exported via gdb.export_fasta.
Logical. If TRUE, creates a misha trackdb from
the output FASTA using gdb.create.
Path for the new trackdb. Defaults to
<dirname(output)>/trackdb.
Integer. Number of bases per FASTA line. Default: 80.
Logical. If TRUE, overwrite existing output file.
Default: FALSE.
Invisibly returns the output FASTA path.
This function fills the gap between manual sequence extraction and genome perturbation by providing a single-call interface for editing genomes.
The donor parameter controls where replacement sequences come from:
If donor is a character vector with one sequence per interval
row, those literal sequences are used directly.
If donor is a single string pointing to an existing directory,
it is treated as a misha database root. Sequences are extracted from
that database at the same coordinates as intervals.
Perturbations are applied in reverse coordinate order within each chromosome so that earlier coordinates remain valid when later ones are replaced.
gdb.init_examples()
# Export the example DB to a reference FASTA
ref_fasta <- tempfile(fileext = ".fa")
gdb.export_fasta(ref_fasta)
# Replace two regions with literal sequences
intervals <- data.frame(
chrom = c("chr1", "chr1"),
start = c(100, 200),
end = c(110, 210)
)
donors <- c("AAAAAAAAAA", "CCCCCCCCCC")
out <- tempfile(fileext = ".fa")
trackdb <- tempfile()
ggenome.implant(intervals, donors,
output = out,
genome_fasta = ref_fasta,
create_trackdb = TRUE,
trackdb_path = trackdb
)
# Verify the implanted sequences via the new trackdb
gdb.init(trackdb)
gseq.extract(data.frame(chrom = "chr1", start = 100, end = 110))
#> [1] "AAAAAAAAAA"
# Clean up
unlink(c(ref_fasta, out, paste0(out, ".fai"), trackdb), recursive = TRUE)