Performs an iterative replacement of a target k-mer with a
replacement sequence. This is useful for creating synthetic genomes
with specific motifs removed (e.g., creating a CpG-null genome by iteratively
swapping CG to GC).
gsynth.replace_kmer(
target,
replacement,
output_path = NULL,
output_format = c("misha", "fasta", "vector"),
intervals = NULL,
check_composition = TRUE
)The k-mer sequence to remove (e.g., "CG").
The replacement sequence (e.g., "GC").
Path to the output file (ignored when output_format = "vector").
Output format:
"misha": .seq binary format (default)
"fasta": FASTA text format
"vector": Return sequences as a character vector (does not write to file)
Genomic intervals to process. If NULL, uses all chromosomes.
Logical. If TRUE (default), ensures target and replacement have the same nucleotide composition (preserving exact base counts).
When output_format is "misha" or "fasta", returns invisible NULL and writes to output_path. When output_format is "vector", returns a character vector of modified sequences.
Bubble Sort / Iterative Logic: The function scans the sequence and
replaces occurrences of target with replacement. If a replacement
creates a new instance of target (e.g., removing "CG" with "GC" in
the sequence "CCG" -> "CGC"), the new instance is also replaced. This continues
until the sequence is free of the target k-mer.
When target and replacement are permutations of each other
(e.g., "CG" and "GC"), this acts as a "bubble sort" of nucleotides, moving
bases locally without altering the total GC content or base counts of the genome.
if (FALSE) { # \dontrun{
# Robust removal of all CpG dinucleotides (preserving GC%)
gsynth.replace_kmer(
target = "CG",
replacement = "GC",
output_path = "genome_no_cpg.seq",
output_format = "misha"
)
} # }