This article is the normative reference for the dafr
query language. The semantics follow the Julia reference implementation
DataAxesFormats.jl;
the short Query DSL vignette is a
tutorial, this one is the specification.
The DSL has two surface forms:
- a string form (
daf["@ cell : donor"]), tokenised and parsed. - a builder form — pipe-composable R objects
(
daf[Axis("cell") |> LookupVector("donor")]).
The two forms are interchangeable: any string is parsed into the same AST the builders produce, so examples below give both forms for the first occurrence of each phrase and then use whichever is clearer.
Execution model
A query is a sequence of operators that is not executed one operator at a time. Instead, the parser groups operators into phrases, where each phrase is a stack-rewriting step:
- The query state is a stack, starting empty.
- Each phrase matches a pattern on the top of the stack (for example,
a
LookupVectorphrase needs an axis specification on top). - The phrase pops the matching top elements, performs its operation, and pushes zero or more new elements.
- When all operators have been consumed, the stack must hold exactly one result; that result is what the query returns.
The grouping-into-phrases means an operator’s meaning depends on the phrase it participates in. Where a table of operators is given below, the phrase column indicates which phrase the operator appears in.
Operator table
| Operator | Builder | Phrase / meaning |
|---|---|---|
@ |
Axis() |
Introduce an axis onto the stack. |
=@ |
AsAxis() |
Treat subsequent values as entries of that axis (for secondary lookup / pivoting). |
@| |
SquareColumnIs() |
Slice a square matrix by column. |
@- |
SquareRowIs() |
Slice a square matrix by row. |
/ |
GroupBy() |
Group a vector by values of another vector of the same length. |
|/ |
GroupColumnsBy() |
Group matrix columns by a vector with one value per column. |
-/ |
GroupRowsBy() |
Group matrix rows by a vector with one value per row. |
% |
eltwise op (Abs, …) |
Apply an element-wise operation to vector or matrix data. |
>> |
reduction op (Sum, …) |
Reduce a vector or matrix to a scalar (per-group vector on a grouped input). |
>| |
ReduceToColumn() |
Reduce a matrix along rows to a single column vector. |
>- |
ReduceToRow() |
Reduce a matrix along columns to a single row vector. |
\|\| |
IfMissing() |
Default value when a lookup target does not exist, or when reducing empty data. |
?? |
IfNot() |
Fallback value when a chained lookup produces an empty result. |
* |
CountBy() |
Count, in a matrix, the number of times each combination of values from two vectors appears. |
? |
Names() |
Return the set of axis / property names reachable from the current stack state. |
. |
LookupScalar() |
Look up a scalar property. |
: |
LookupVector() |
Look up a vector property on the current axis. |
:: |
LookupMatrix() |
Look up a matrix property on a pair of axes. |
< |
IsLess() |
Mask comparison: strictly less. |
<= |
IsLessEqual() |
Mask comparison: less or equal. |
= |
IsEqual() |
Mask comparison: equal. |
!= |
IsNotEqual() |
Mask comparison: not equal. |
>= |
IsGreaterEqual() |
Mask comparison: greater or equal. |
> |
IsGreater() |
Mask comparison: strictly greater. |
~ |
IsMatch() |
Mask comparison: regex match. |
!~ |
IsNotMatch() |
Mask comparison: regex non-match. |
[ |
BeginMask() |
Start a mask sub-expression on the current axis. |
[ ! |
BeginNegatedMask() |
Start a negated mask on the current axis. |
] |
EndMask() |
Finish the mask; the axis becomes filtered. |
& |
AndMask() |
Combine masks with AND. |
& ! |
AndNegatedMask() |
Combine masks with AND NOT. |
\| |
OrMask() |
Combine masks with OR. |
\| ! |
OrNegatedMask() |
Combine masks with OR NOT. |
^ |
XorMask() |
Combine masks with XOR. |
^ ! |
XorNegatedMask() |
Combine masks with XOR NOT. |
Element-wise operation builders include Abs,
Clamp, Convert, Fraction,
Log, Round, Significant.
Reduction operation builders include Sum,
Mean, Median, Mode,
Quantile, Max, Min,
Count, GeoMean, Std,
StdN, Var, VarN.
The four query kinds
Every query returns one of four result kinds, selected by the phrases it uses.
Names query — ?
Returns the set of names at some point in the data hierarchy.
| Goal | String | Builder |
|---|---|---|
| Names of all scalars | ". ?" |
LookupScalar("?") |
| Names of all axes | "@ ?" |
Axis("?") |
Vector properties on gene
|
"@ gene : ?" |
Axis("gene") |> LookupVector("?") |
Matrix properties on (cell, gene)
|
"@ cell @ gene :: ?" |
Axis("cell") |> Axis("gene") |> LookupMatrix("?") |
cells[". ?"]
#> [1] "organism" "reference"
cells["@ gene : ?"]
#> [1] "is_lateral"
cells["@ cell @ gene :: ?"]
#> [1] "UMIs"Scalar query — . /
... >> Reduction
Returns a single scalar value. Produced by:
- Looking up a scalar (
. name), optionally with a||default and an optionaltype Tto pin the default’s dtype. - Looking up a vector and picking one entry
(
: vec @ axis = entry). - Looking up a matrix and picking one cell
(
:: m @ rows = R @ cols = C). - Reducing any vector to a scalar
(
... >> Reduction). - Reducing any matrix to a scalar
(
... >> Reduction).
cells[". organism"]
#> [1] "human"
cells[". nope || fallback"]
#> [1] "fallback"
cells[". nope || 0 type Int64"]
#> integer64
#> [1] 0
# Pick one entry from a vector.
cells["@ donor : age @ donor = N89"]
#> [1] 55
# Number of genes marked as lateral (reduce a vector of Bool).
cells["@ gene : is_lateral >> Sum type Int64"]
#> [1] 438
# Total UMIs in the whole matrix.
cells["@ cell @ gene :: UMIs >> Sum type Int64"]
#> [1] 1171936Vector query
Returns a vector. Phrase kinds:
- Axis entries:
@ axis. - Axis entries after masking:
@ axis [ ... ]. - Property lookup along an axis:
@ axis : property. - Any of the above with an element-wise op appended:
@ axis : property % Op. - Reduction of a matrix along one dimension:
@ rows @ cols :: m >- Op(row result) or>| Op(column result).
head(cells["@ experiment"])
#> demux_01_02_21_1 demux_01_02_21_2 demux_01_03_21_1 demux_04_01_21_1
#> "demux_01_02_21_1" "demux_01_02_21_2" "demux_01_03_21_1" "demux_04_01_21_1"
#> demux_04_01_21_2 demux_07_03_21_1
#> "demux_04_01_21_2" "demux_07_03_21_1"
length(cells["@ gene [ ! is_lateral ]"])
#> [1] 245
head(cells["@ cell : donor"])
#> demux_07_12_20_1_AACAAGATCCATTTCA-1 demux_07_12_20_1_AACGAAAGTCCAATCA-1
#> "N89" "N84"
#> demux_07_12_20_1_AAGACAAAGTTCCGTA-1 demux_07_12_20_1_AGACTCATCTATTGTC-1
#> "N86" "N84"
#> demux_07_12_20_1_AGATAGACATTCCTCG-1 demux_07_12_20_1_ATCGTAGTCCAGTGCG-1
#> "N89" "N89"
# Per-row (per-gene) mean UMIs.
head(cells["@ gene @ cell :: UMIs >- Mean"])
#> demux_07_12_20_1_AACAAGATCCATTTCA-1 demux_07_12_20_1_AACGAAAGTCCAATCA-1
#> 3.351391 4.535871
#> demux_07_12_20_1_AAGACAAAGTTCCGTA-1 demux_07_12_20_1_AGACTCATCTATTGTC-1
#> 2.411420 4.131772
#> demux_07_12_20_1_AGATAGACATTCCTCG-1 demux_07_12_20_1_ATCGTAGTCCAGTGCG-1
#> 2.641288 3.929722Masks
A mask filters an axis to the entries for which a predicate holds.
Phrase structure: @ axis [ <mask-body> ].
The mask body starts with a property lookup (BeginMask)
and is followed by zero or more comparisons and
mask combinators.
# Donors older than 30.
length(cells[Axis("donor") |>
BeginMask("age") |> IsGreater(30) |> EndMask()])
#> [1] 93
# Same in string form.
length(cells["@ donor [ age > 30 ]"])
#> [1] 93
# Negated mask: genes that are NOT lateral.
length(cells["@ gene [ ! is_lateral ]"])
#> [1] 245
# Combined masks: age > 60 AND sex = male (on donor axis).
length(cells["@ donor [ age > 60 & sex = male ]"])
#> [1] 29Supported comparisons: <, <=,
=, !=, >=, >,
~ (regex match), !~ (regex non-match).
Mask combinators: &, & !,
|, | !, ^, ^ !. They
are evaluated left to right — they are not grouped by
precedence. Use separate masks with explicit combinators when order
matters.
Defaults and fallbacks
Two operators supply fallback values:
-
|| value(IfMissing) — used when the thing being looked up does not exist, or when reducing an empty vector / matrix. -
?? value(IfNot) — used during chained lookups (: a =@ : b) when the intermediate key is empty.
cells[". nope || fallback"]
#> [1] "fallback"
# Fallback when a chained lookup runs out of pivot values.
head(metacells["@ cell : metacell =@ ?? none : type"])
#> demux_07_12_20_1_AACAAGATCCATTTCA-1 demux_07_12_20_1_AACGAAAGTCCAATCA-1
#> "MEBEMP-E" "none"
#> demux_07_12_20_1_AAGACAAAGTTCCGTA-1 demux_07_12_20_1_AGACTCATCTATTGTC-1
#> "MEBEMP-E" "MPP"
#> demux_07_12_20_1_AGATAGACATTCCTCG-1 demux_07_12_20_1_ATCGTAGTCCAGTGCG-1
#> "MEBEMP-E" "none"Element-wise and reduction operations
Element-wise ops are applied with %; reductions on a
matrix with >- (reduce to row) or >|
(reduce to column). Both take keyword-style sub-arguments separated by
spaces:
daf["... % Log base 2 eps 1e-5"]
daf["... >- Mean"]
daf["... >| Sum type Int64"]Built-in element-wise: Abs, Clamp low h,
Convert type T, Fraction,
Log base b eps e, Round,
Significant digits n.
Built-in reductions: Sum, Mean,
Median, Mode, Quantile p P,
Max, Min, Count,
GeoMean, Std, StdN,
Var, VarN.
User code can extend either set:
register_eltwise("MySquare", function(x) x * x)
register_reduction("MyMax", function(x) max(x, na.rm = TRUE))Grouping
Grouping appears in three phrase contexts:
-
GroupBy(/) — reduce a vector to groups defined by another vector of equal length, yielding a shorter vector keyed by group. -
GroupRowsBy(-/) — group the rows of a matrix and emit one row per group. -
GroupColumnsBy(|/) — group the columns of a matrix and emit one column per group.
The matrix-group forms must be paired with a compatible reduction
(-/ pairs with >-, |/ with
>|).
# Mean per (donor, gene) — collapse cells into donor rows.
dim(cells["@ cell @ gene :: UMIs -/ donor >- Mean"])
#> [1] 95 683Appending =@ after a group property changes the result
to be ordered by that axis’s entries (rather than by unique group
value). Appending || default before the reduction supplies
a value for empty groups.
Axis-as-value and chained lookup (=@)
=@ declares that a set of values should be interpreted
as entries of a named axis. That lets the query walk one axis into
another:
# Cell -> metacell -> type: pipe values through the metacell axis.
head(metacells["@ cell : metacell =@ ?? none : type"])
#> demux_07_12_20_1_AACAAGATCCATTTCA-1 demux_07_12_20_1_AACGAAAGTCCAATCA-1
#> "MEBEMP-E" "none"
#> demux_07_12_20_1_AAGACAAAGTTCCGTA-1 demux_07_12_20_1_AGACTCATCTATTGTC-1
#> "MEBEMP-E" "MPP"
#> demux_07_12_20_1_AGATAGACATTCCTCG-1 demux_07_12_20_1_ATCGTAGTCCAGTGCG-1
#> "MEBEMP-E" "none"In the above, : metacell =@ says “these values are
entries of the metacell axis”, then : type
looks up the type property on that axis. The
?? NA fallback covers cells without an assigned
metacell.
SquareColumnIs (@|) and
SquareRowIs (@-) slice a square matrix by a
specific column / row of that axis; the right-hand side of each is a
value of the matrix’s square axis.
Parsing, canonicalisation, and equivalence
parse_query(s) returns the builder AST;
canonical_query(q) returns a normalised string; both forms
of the same query produce the same AST.
s <- "@ gene : is_lateral"
q_bld <- Axis("gene") |> LookupVector("is_lateral")
identical(canonical_query(s), as.character(q_bld))
#> [1] TRUEInspection helpers:
-
has_query(daf, q)—TRUEifqcould be evaluated againstdaf. -
is_axis_query(q)—TRUEif the result is an axis. -
query_axis_name(q)— the axis name (if applicable). -
query_result_dimensions(q)— 0 (scalar), 1 (vector), 2 (matrix). -
query_requires_relayout(q)— whether a matrix relayout is needed.
String literals and escaping
String literal values inside a query can include spaces and most
punctuation. Characters that collide with the DSL’s own tokens
(@, :, |, [,
], …) must be escaped. Use escape_value() to
produce a safe literal and unescape_value() to reverse
it.
escape_value("a | b")
#> [1] "\"a | b\""Caching
get_query(daf, q) caches the result as
QueryData inside the daf’s internal cache. Use
daf[q] for a one-shot evaluation that does
not cache. empty_cache(daf) releases all
cached results and is also invoked automatically on every mutating write
to the daf so cached results never become stale.
Further reading
- Query DSL tutorial — short introduction.
- Reference: Query DSL (string form) — per-function docs for parsing, inspection, and escape helpers.
- Reference: Query builders — per-function docs for each builder.
- Upstream spec in
DataAxesFormats.jl/src/queries.jlfor semantic cross-reference.