Creates a new Genomic Database.

gdb.create(
  groot = NULL,
  fasta = NULL,
  genes.file = NULL,
  annots.file = NULL,
  annots.names = NULL
)

Arguments

groot

path to newly created database

fasta

an array of names or URLs of FASTA files. Can contain wildcards for multiple files

genes.file

name or URL of file that contains genes. If 'NULL' no genes are imported

annots.file

name of URL file that contains annotations. If 'NULL' no annotations are imported

annots.names

annotations names

Value

None.

Details

This function creates a new Genomic Database at the location specified by 'groot'. FASTA files are converted to 'Seq' format and appropriate 'chrom_sizes.txt' file is generated (see "User Manual" for more details).

If 'genes.file' is not 'NULL' four sets of intervals are created in the database: tss, exons, utr3 and utr5. See gintervals.import_genes for more details about importing genes intervals.

'fasta', 'genes.file' and 'annots.file' can be either a file path or URL in a form of 'ftp://[address]/[file]'. 'fasta' can also contain wildcards to indicate multiple files. Files that these arguments point to can be zipped or unzipped.

See the 'Genomes' vignette for details on how to create a database from common genome sources.

Examples

# \donttest{
ftp <- "ftp://hgdownload.soe.ucsc.edu/goldenPath/mm10"
mm10_dir <- file.path(tempdir(), "mm10")
# only a single chromosome is loaded in this example
# see "Genomes" vignette how to download all of them and how
# to download other genomes
gdb.create(
    mm10_dir,
    paste(ftp, "chromosomes", paste0(
        "chr", c("X"),
        ".fa.gz"
    ), sep = "/"),
    paste(ftp, "database/knownGene.txt.gz", sep = "/"),
    paste(ftp, "database/kgXref.txt.gz", sep = "/"),
    c(
        "kgID", "mRNA", "spID", "spDisplayID", "geneSymbol",
        "refseq", "protAcc", "description", "rfamAcc",
        "tRnaName"
    )
)
#> Downloading ftp://hgdownload.soe.ucsc.edu/goldenPath/mm10/chromosomes/chrX.fa.gz
#> Building Seq files...
#> chrX
#> Downloading ftp://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/knownGene.txt.gz
#> Downloading ftp://hgdownload.soe.ucsc.edu/goldenPath/mm10/database/kgXref.txt.gz
#> Database was successfully created
gdb.init(mm10_dir)
gintervals.ls()
#> [1] "exons" "tss"   "utr3"  "utr5" 
gintervals.all()
#>   chrom start       end
#> 1  chrX     0 171031299
# }