Skip to contents

What is a Daf?

A Daf (Data Axes Format) is a typed, multi-axis container for scientific data. Each daf holds:

  • Axes - named lists of entries (e.g. cell, gene, donor).
  • Scalars - single values (e.g. organism = "human").
  • Vectors - one value per entry on a chosen axis.
  • Matrices - one value per pair of entries on two chosen axes.

Think of it as AnnData generalised to any number of axes, with typed scalars, on-disk persistence in several formats, and a query DSL for composing reads.

Build a daf in memory

d <- memory_daf(name = "demo")
add_axis(d, "cell", c("c1", "c2", "c3"))
add_axis(d, "gene", c("g1", "g2"))
set_scalar(d, "organism", "human")
set_vector(d, "cell", "donor", c("A", "B", "A"))
set_matrix(d, "cell", "gene", "UMIs",
           matrix(1:6, nrow = 3, ncol = 2))
print(d)
#> <dafr::MemoryDaf>
#>  @ name                  : chr "demo"
#>  @ internal              :<environment: 0x5619227594c0> 
#>  @ cache                 :<environment: 0x561922796490> 
#>  @ axis_version_counter  :<environment: 0x56192279c848> 
#>  @ vector_version_counter:<environment: 0x56192279c570> 
#>  @ matrix_version_counter:<environment: 0x56192279c298>

description(d) produces a more detailed dump of axes, vectors, matrices, and scalars, useful for inspecting unfamiliar dafs:

cat(description(d))
#> name: demo
#> type: MemoryDaf
#> scalars:
#>   organism: "human"
#> axes:
#>   cell: 3 entries
#>   gene: 2 entries
#> vectors:
#>   cell:
#>     donor
#> matrices:
#>   cell,gene:
#>     UMIs
#>   gene,cell:
#>     UMIs

Read data back

get_scalar(d, "organism")
#> [1] "human"
get_vector(d, "cell", "donor")
#>  c1  c2  c3 
#> "A" "B" "A"
get_matrix(d, "cell", "gene", "UMIs")
#>    g1 g2
#> c1  1  4
#> c2  2  5
#> c3  3  6

scalars_set(d), vectors_set(d, axis), and matrices_set(d, a, b) list the property names present on the corresponding container.

Queries

Queries let you compose reads. Two equivalent forms - a string DSL and pipe-chain builders - share the same execution engine:

d[". organism"]
#> [1] "human"
d[Axis("cell") |> LookupVector("donor")]
#>  c1  c2  c3 
#> "A" "B" "A"
# Mean UMI count per cell (reduce across the gene axis):
d["@ cell @ gene :: UMIs >| Mean"]
#>  c1  c2  c3 
#> 2.5 3.5 4.5

See vignette("queries", package = "dafr") for the practical tour and vignette("query-dsl-reference", package = "dafr") for the full operator grammar.

Data frames and dplyr

get_dataframe() materialises an axis-keyed slice as a tibble:

get_dataframe(d, "cell")
#>    donor
#> c1     A
#> c2     B
#> c3     A

The dplyr backend lets you treat any axis as a tbl and pipe verbs:

library(dplyr, warn.conflicts = FALSE)
tbl(d, "cell") |>
    group_by(donor) |>
    summarise(n = n())
#> # A tibble: 2 × 2
#>   donor     n
#>   <chr> <int>
#> 1 A         2
#> 2 B         1

See vignette("dplyr", package = "dafr") for the supported verbs and write-back semantics.

Persistence

A daf can be persisted in three on-disk shapes - pick the one that matches how the data needs to travel.

# files_daf: one file per property, mmap-backed reads.
path <- tempfile("dafr-")
fd <- files_daf(path, mode = "w+", name = "persisted")
copy_all(fd, d)  # destination first, then source
list.files(path, recursive = TRUE)
#>  [1] "axes/cell.txt"                "axes/gene.txt"               
#>  [3] "axes/metadata.json"           "daf.json"                    
#>  [5] "matrices/cell/gene/UMIs.data" "matrices/cell/gene/UMIs.json"
#>  [7] "matrices/gene/cell/UMIs.data" "matrices/gene/cell/UMIs.json"
#>  [9] "metadata.zip"                 "scalars/organism.json"       
#> [11] "vectors/cell/donor.json"      "vectors/cell/donor.txt"
# Reopen read-only:
fd2 <- files_daf(path, mode = "r")
get_scalar(fd2, "organism")
#> [1] "human"
get_matrix(fd2, "cell", "gene", "UMIs")
#>    g1 g2
#> c1  1  4
#> c2  2  5
#> c3  3  6

zarr_daf() (directory or .daf.zarr.zip) and http_daf() (read-only HTTP) round-trip the same data; see vignette("zarr", package = "dafr").

Where to go next

Example data

d2 <- example_cells_daf()
axes_set(d2)
#> [1] "cell"       "donor"      "experiment" "gene"