dafr exposes a dplyr
backend for the axes of a daf. Each axis becomes a
tbl whose rows are the axis entries and whose columns are
the vectors defined on that axis. Matrices are intentionally
not exposed through this interface — use the native query DSL
(vignette("queries", package = "dafr")) for those.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
d <- example_cells_daf()
tbl(d, "cell")
#> <daf_axis_tbl> axis: cell [856 rows]
#> # A tibble: 6 × 3
#> name donor experiment
#> <chr> <chr> <chr>
#> 1 demux_07_12_20_1_AACAAGATCCATTTCA-1 N89 demux_07_12_20_1
#> 2 demux_07_12_20_1_AACGAAAGTCCAATCA-1 N84 demux_07_12_20_1
#> 3 demux_07_12_20_1_AAGACAAAGTTCCGTA-1 N86 demux_07_12_20_1
#> 4 demux_07_12_20_1_AGACTCATCTATTGTC-1 N84 demux_07_12_20_1
#> 5 demux_07_12_20_1_AGATAGACATTCCTCG-1 N89 demux_07_12_20_1
#> 6 demux_07_12_20_1_ATCGTAGTCCAGTGCG-1 N89 demux_07_12_20_1Familiar verbs
filter, select, mutate,
arrange, summarise, group_by,
distinct, and pull all work as you’d expect.
collect() materializes the tbl as a tibble.
Write-back is explicit
mutate() stores computed columns in memory; it does not
touch the daf. Persisting them as vectors on the axis requires an
explicit compute(vectors = ...) — dafr never silently
writes to the daf.
The daf backend reuses dplyr::compute() (with an extra
vectors = ... argument) rather than introducing a new
generic, so it doesn’t shadow dplyr::compute() for dbplyr
users.
compute() errors if the tbl has been filtered (partial
row mask) — you cannot write back a partial vector. A permuted-but-full
row mask (e.g. after arrange()) is un-permuted to axis
order before writing.
Grouping that looks up an axis
If the grouping variable names an existing axis in the daf, the
result of summarise() is a daf_axis_tbl keyed
to that axis. You can keep piping dplyr verbs, and later
compute() the derived columns back as vectors on that
axis.
# "donor" is an axis; the summarise result is itself tbl(d, "donor")-like.
tbl(d, "cell") |>
group_by(donor) |>
summarise(mean_umis = mean(n_umis)) |>
compute(vectors = "mean_umis") # persists on the donor axisIf the grouping variable is just a vector (not an axis),
summarise() returns a plain tibble.
More verbs
Beyond the core set above, the backend also supports
slice and the slice_head /
slice_tail / slice_min /
slice_max / slice_sample family;
rename and relocate; count /
tally / add_count / add_tally;
transmute and reframe. Inside
mutate() and summarise(), common dplyr helpers
work via delegation:
- Window functions:
lag,lead,cumsum,row_number,min_rank,dense_rank,ntile,percent_rank. - Scalar helpers:
if_else,case_when,coalesce,n_distinct,first,last,nth. -
across(where(is.numeric), mean)-style column-wise ops. - Tidyselect helpers in
select():starts_with,contains,matches,where.
.by = ... (dplyr 1.1+) works on filter() /
mutate() / summarise(); a single-axis
.by on summarise() ties back the same way
group_by does. mutate(.keep = "none") is respected.
Not supported (yet)
Matrices. Use
Axis(...) |> LookupMatrix(...)from the query DSL.-
Joins (
inner_join,left_join,right_join,full_join,semi_join,anti_join,cross_join,nest_join). A daf_axis_tbl has no meaningful join semantics against another table in v1. Calling any of these produces a helpful error pointing you atdplyr::collect(): Set operations (
union,union_all,intersect,setdiff) — same story.DSL pushdown for
filter/summarise. Verbs currently materialize viaget_vector, which is cheap for memory daf but not optimal for largeFilesDafwith selective filters.