Creates a new Genomic Database.
gdb.create(
groot = NULL,
fasta = NULL,
genes.file = NULL,
annots.file = NULL,
annots.names = NULL
)
path to newly created database
an array of names or URLs of FASTA files. Can contain wildcards for multiple files
name or URL of file that contains genes. If 'NULL' no genes are imported
name of URL file that contains annotations. If 'NULL' no annotations are imported
annotations names
None.
This function creates a new Genomic Database at the location specified by 'groot'. FASTA files are converted to 'Seq' format and appropriate 'chrom_sizes.txt' file is generated (see "User Manual" for more details).
If 'genes.file' is not 'NULL' four sets of intervals are created in the
database: tss
, exons
, utr3
and utr5
. See
gintervals.import_genes for more details about importing genes
intervals.
'fasta', 'genes.file' and 'annots.file' can be either a file path or URL in a form of 'ftp://[address]/[file]'. 'fasta' can also contain wildcards to indicate multiple files. Files that these arguments point to can be zipped or unzipped.
See the 'Genomes' vignette for details on how to create a database from common genome sources.
# \donttest{
ftp <- "ftp://hgdownload.soe.ucsc.edu/goldenPath/mm10"
mm10_dir <- file.path(tempdir(), "mm10")
# only a single chromosome is loaded in this example
# see "Genomes" vignette how to download all of them and how
# to download other genomes
gdb.create(
mm10_dir,
paste(ftp, "chromosomes", paste0(
"chr", c("X"),
".fa.gz"
), sep = "/"),
paste(ftp, "database/knownGene.txt.gz", sep = "/"),
paste(ftp, "database/kgXref.txt.gz", sep = "/"),
c(
"kgID", "mRNA", "spID", "spDisplayID", "geneSymbol",
"refseq", "protAcc", "description", "rfamAcc",
"tRnaName"
)
)
#> Downloading ftp://hgdownload.soe.ucsc.edu/goldenPath/mm10/chromosomes/chrX.fa.gz
#> Building Seq files...
#> chrX
#> Downloading ftp://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/knownGene.txt.gz
#> Downloading ftp://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/kgXref.txt.gz
#> Database was successfully created
gdb.init(mm10_dir)
gintervals.ls()
#> [1] "exons" "tss" "utr3" "utr5"
gintervals.all()
#> chrom start end
#> 1 chrX 0 171031299
# }