Changelog
Source:NEWS.md
dafr 0.4.9
Maintenance release - no user-facing changes. Drops a non-portable test assertion that compared a freshly re-compressed packed gzip shard byte-for-byte against a fixture; that comparison depends on the platform zlib’s DEFLATE output (R’s zlib vs Julia’s CodecZlib) and failed CI on some runners. Only the framing is guaranteed byte-identical across implementations (still tested); the 0.4.8 packed/sharded write feature is otherwise unchanged.
dafr 0.4.8
Packed/sharded WRITE for ZarrDaf and FilesDaf, byte-compatible with DataAxesFormats.jl (verified read-back in both directions).
Packed/sharded WRITE (ZarrDaf + FilesDaf)
- Writing packed/sharded stores is now supported:
zarr_daf(..., packed = TRUE)andfiles_daf(..., packed = TRUE)produce the same dual-format (“indexed+zipped”) shards DataAxesFormats.jl writes. Packing is per-component and threshold-gated; tune withoptions(dafr.packed_compression=, dafr.packed_compression_level=, dafr.packed_target_chunk_kb=)(defaultsblosc_zstd_bitshuffle, 5, 8). Verified byte-readable by DataAxesFormats.jl 0.3.0 in both directions for the gzip, zstd, and blosc codecs. -
gzip-packed write needs no extra library (CRAN-safe);zstd/blosc_*need the same optional system libzstd / c-blosc probed byconfigurefor packed reads. Requesting a codec whose library is absent raises an actionable error. String properties are always written flat.
dafr 0.4.7
Continued parity-audit cleanup against DataAxesFormats.jl. Each change is test-driven and the full suite stays green. See dev/parity-audit-2026-06-11/ for the audit trail.
AnnData interop
Read AnnData >= 0.12
nullable-string-arraydata. AnnData 0.12 stores strings as a{values, mask}group rather than a plain string dataset. This encoding is used for theobs/var_index, for categoricalcategories, and for plain pandasstringcolumns - so a real 0.12.h5adpreviously failed to load at the very first index read, and string columns were silently skipped.h5ad_as_daf()now reads all three (masked entries becomeNA).Read AnnData >= 0.12
nullable-integer/nullable-booleancolumns. NA-bearing pandasInt64/booleancolumns use the same{values, mask}group encoding and are now read as integer / logical vectors withNAfor masked entries.
Persistent chains
-
complete_daf()scopes a storedbase_daf_viewto the base only. On reopen, the view now wraps just the base sub-chain with the leaf chained on top (matching howcomplete_chain()writes it). Previously the whole chain was wrapped in the view, which hid leaf-local data on a renamed view axis (“missing vector”), made a leaf override return the base value, and left the leaf non-writable undermode = "r+".
FilesDaf storage
-
Sparse matrices/vectors write the narrowest on-disk index type. A small sparse store (index range <= 65535) now writes
UInt16colptr/rowvalinstead of alwaysUInt32, matching Julia’sindtype_for_size. The in-memory representation is unchanged; only the on-disk descriptor narrows. Existing stores written withUInt32continue to read correctly.
dafr 0.4.6
Documentation hotfix on top of 0.4.5 (no functional change). The relayout default flip to TRUE in 0.4.5 left two R CMD check issues that the test suite does not exercise:
relayout_matrix()example fixed. Its@examplesstored a matrix and then relayouted it; withrelayout = TRUEnow the default,set_matrix()already wrote both layouts, so the example errored with “existing matrix”. The example now passesrelayout = FALSEto the initialset_matrix(), so it still demonstrates physically materializing the transpose.Regenerated
man/*.Rd.set_matrix.Rdhad a codoc mismatch (\usagestill showedrelayout = FALSE);concatenate.Rdandread_only.Rdwere also re-synced to the 0.4.5 behavior (M2 sparse-collect heuristic, C1 read-only identity).
dafr 0.4.5
Parity-audit release: an exhaustive differential sweep against DataAxesFormats.jl 0.3.0 (55 confirmed divergences triaged) produced the fixes below. Each is test-driven (failing-first test, minimal fix) and the full suite stays green. See dev/parity-audit-2026-06-11/ for the audit trail.
AnnData interop
Dense
/Xand dense layers use the canonical AnnData(n_obs, n_var)orientation. Previously the dense matrix was written and read with no transpose, producing a(n_var, n_obs)on-disk/X: a realscanpy/anndatafile failed to load (dimension mismatch) and dafr-written h5ad files were transposed relative to the ecosystem. The reader now reshapes from the known axis lengths (robust tohdf5rdropping a singleton dimension to a vector), and the writer emits the AnnData encoding attributes (arrayon/Xand layers;dataframewith_index+column-orderonobs/var) so row and column names round-trip instead of falling back to0..n-1. Sparse/X(explicitshape/indptr/indices) was already correct and is unchanged. Verified both directions against Pythonanndata.obsm/varmdense embeddings use the canonical(n_axis, k)orientation, the same fix as/X. Per-obs and per-var embeddings now interoperate with the wider AnnData ecosystem.
Query
-
GroupBy/CountByorder group labels bytewise (method = "radix"), matching Julia, instead of by the ambientLC_COLLATE. Group output is now locale-independent.
Concatenation
-
concatenate()errors when a collect-axis source lacks the scalar being collected, instead of silently producingNA.
Readers
get_vector()errors on a nameddefaultwhose names mismatch the axis order, rather than silently misaligning the values.The reader API exposes the reserved
name/indexvirtual vectors viahas_vector()/get_vector(), matching Julia.
Writers and layout
-
set_matrix()defaults torelayout = TRUE, matching Julia: storing a matrix now also stores its transposed layout by default. This is a behavior/storage-size change; passrelayout = FALSEto keep a single layout. Internal call sites that immediately relayout (example_*_daf(), the files/zarr converters) were updated so the default does not collide.
Copies
copy_matrix()transpose-reads a flipped-only source instead of erroring.copy_all()copies a both-layouts matrix once (Julia’scolumns_axis >= rows_axisguard) instead of hitting an “existing matrix” collision whenrelayout = TRUE.copy_tensor()withempty = NULLskips a missing slice instead of erroring.
Reconstruction
-
reconstruct_axis()rewrites the implicit property as a string foreign key into the new axis (Julia’soverwrite_implicit_valuescondition), and keeps the empties-mapping key for properties that have no empties.
Zarr
-
The dense reader reconstructs an elided all-fill chunk from
fill_valuefor both the vector and matrix paths, instead of reading zeros/garbage when a writer omits an all-fill chunk.
Adapters and computations
Adapter copy-back uses
insist = TRUE: a name collision on copy-back now errors instead of silently dropping data.computation()threadsoverwriteinto the contractor, so an idempotent re-run withoverwrite = TRUEsucceeds (COMP-01).
Documented deliberate deviations (intentionally not changed)
-
group_namesuses an FNV-32 hash rather than Julia’s simhash (shape-parity only);~/!~regex match collects multi-token patterns to allow unescaped metacharacters; contracts keep dafr’s stricterOptional/GuaranteedOutputenforcement (safer than DAF.jl 0.3.0, raised upstream);relayout_matrix()on a write-chain succeeds with both layouts where Julia errors. Unsigned andFloat32widths andUInt64precision are bounded by R’s native types (documented; not a correctness regression).
dafr 0.4.4
Parity
Singleton chains return their input unwrapped (C1).
chain_reader(list(d))now delegates toread_only(d),chain_writer(list(d))(with no explicit name) returnsditself, andread_only()is the identity on data that is already read-only unless anameforces a fresh wrapper - matchingDataAxesFormats.jlchains.jl/read_only.jl. Previously every singleton chain allocated a redundant wrapper.concatenate()MERGE_COLLECT_AXISpreserves sparse vectors (M2). A collect-axis vector merge now applies Julia’s storage-savings heuristic (sparse_if_saves_storage_fraction, default 0.25): when sparse storage would save at least that fraction, the collected(axis x dataset)result is built as aMatrix::sparseMatrixinstead of always materializing dense. Sparse sources contribute their nnz, dense sources their full length; string andbit64columns stay dense.
dafr 0.4.3
Compatibility
-
FilesFormat v1.1 read is now pinned against real Julia output. Reading
DataAxesFormats.jl0.3.0 v1.1FilesDafstores has worked since 0.4.0, but the only test against a real Julia-written v1.1 repo skipped without a live Julia env. Added a committed Julia-0.3.0-written flat v1.1 fixture (tests/testthat/fixtures/jf11: scalar + dense/sparse vector + dense/sparse matrix) and an always-on test that reads it - confirming v1.1 interop needs no c-blosc/zstd (Julia writes flat by default; only explicitly blosc-packed components require c-blosc, which fails with an actionable “install c-blosc” message).
Performance
-
ZarrDaf axis-name decode is now memoized.
format_axis_array/axis_vectorpreviously re-decoded the vlen-utf8 axis-name strings from the store on every call, so every distinct query over an axis re-paid that decode (only an exact-repeat query hit the result cache). On a 4000+2500 fixture this was ~45% of a dense matrix-query’s time. The decoded entries are now cached at the"memory"tier keyed byaxis+ the axis version stamp (invalidated bydelete_axis, same contract as the vector/matrix caches), so distinct queries over the same axes decode once. Measured ~1.6x on a>| Sumquery;% Logand any axis-touching query benefit identically. Chains/views over a ZarrDaf inherit the cache automatically.
dafr 0.4.2
Test coverage + parity
- Expanded test coverage against the DataAxesFormats.jl test suite after a full coverage audit + adversarial R-vs-Julia sweep:
- Packed FilesDaf read is now exercised across every numeric dtype (
Int8..UInt64,Float32/64) plus all-trueBoolsparse,NaN/Inf, andvlen-utf8unicode/control strings (previously onlyFloat64/Int64). - Vector
>>reductions onNaN/±Infinputs are pinned against DAF.jl. - Sparse
Float/Bool/Intvector round-trips (memory + files),copy_vectorsupersetemptyfill, andgroup_namesdeterminism.
- Packed FilesDaf read is now exercised across every numeric dtype (
-
Documented divergence:
>> Medianof a vector containing bothNaNand±InfreturnsNaNin dafr vs-Infin DataAxesFormats.jl (every other reduction on such input isNaNin both). R’sNaNis arguably the more correct answer; matching Julia would require reimplementing its median kernel. Seedev/adversarial-parity/FINDINGS.md(E1).
dafr 0.4.1
Packaging
- Shorten the committed packed-fixture directory names (
tests/testthat/fixtures/{zpk,fpk}/...) so every path in the source tarball stays under the 100-byte portable limit. R’s internaltar(used byR CMD buildon Windows) emits a “storing paths of more than 100 bytes is not portable” warning above it, which the CI’serror-on: warningturned into a failure. No functional change.
dafr 0.4.0
FilesFormat: writes v1.1 (DataAxesFormats.jl 0.3.0 default)
-
FilesDafnow writes the v1.1 on-disk format ({"version":[1,1]}), matching DataAxesFormats.jl 0.3.0. The only change from v1.0 is the sparse property JSON descriptor: it now carries a per-component object (nzind/nzvalfor vectors,colptr/rowval/nzvalfor matrices), each shaped like a stand-alone dense descriptor with itseltypeandn_elements. The binary payload files are byte-identical to v1.0, and the reader accepts both shapes, so existing v1.0 repos still read. Verified round-trip in both directions against DAF 0.3.0 (dafr reads Julia’s v1.1; Julia reads dafr’s).
ZarrDaf: faster bulk writes
- Writing many properties to a directory/in-memory ZarrDaf is now near-linear instead of O(N^2).
set_*updates the root consolidated metadata incrementally (editing an in-memory index of nodezarr.jsonstrings and re-assembling the root by string concatenation) rather than re-scanning and re-parsing the whole store on every mutation. Writing 400 vectors dropped from ~45 s to ~1.7 s. The store stays consistent on disk after every write (no flush/close step), and the output is unchanged.
ZarrDaf: Zarr v3 (DataAxesFormats.jl 0.3.0 interop)
ZarrDaf now reads and writes the Zarr v3 on-disk format used by DataAxesFormats.jl 0.3.0 (a single
zarr.jsonper node,c/-prefixed chunk keys, thedafversion marker as a root-group attribute, and inline consolidated metadata). Flat (uncompressed) read and write are supported.Breaking: the legacy Zarr v2 reader/writer is removed. Opening a Zarr v2
.daf.zarrnow errors with a conversion hint (python -m zarr v2_to_v3), matching DataAxesFormats.jl 0.3.0’s own behaviour. ## ZarrDaf: packed/sharded v3 readReading packed/sharded (
packed=true) Zarr v3.daf.zarrstores is now supported (read-only; dafr still writes flat). Each packed array’s start-located shard index (ZEP-0002, crc32c-checked) is parsed in R, and its inner chunks decode viagzip(base R, always available) or - when the optional system library is present -c-blosc(the defaultblosc_zstd_bitshuffle/blosc_lz4_bitshufflecodecs) andlibzstd(plainzstd). Dense and sparse matrices/vectors, includingvlen-utf8strings, are covered; flat sub-threshold components in a packed store read as before.CRAN-safe:
configureprobes forc-blosc/libzstd(honouringBLOSC_HOME/ZSTD_HOME/CONDA_PREFIX). With neither present the flat path is unchanged and a blosc/zstd-packed read raises an actionable “install c-blosc/libzstd” error.crc32cis always compiled (no dependency).
FilesFormat: packed/sharded read (FilesDaf, HttpDaf)
- Reading packed (chunked + compressed)
FilesDaf/HttpDafproperties is now supported (read-only; dafr still writes flat). A packed property is a dual-format shard (<name>.zip, or<name>.<component>.zipfor an independently-packed sparse component) that carries the same start-located Zarr v3 shard index as a packedZarrDafarray. dafr reads it through that index, reusing the crc32c +gzip/c-blosc/libzstddecode backend (so the same optional-library rules and CRAN-safety apply). Dense and sparse matrices/vectors andvlen-utf8strings are covered; flat sub-threshold components (smallcolptr, scalars, short vectors) in the same store read as before. A foreign"zipped"-only shard (a ZIP archive with no leading Zarr index) is rejected with an actionable message.
ZarrDaf over HTTP: Zarr v3 read
-
Reading a Zarr v3
.daf.zarrover HTTP now works.zarr_daf("http://...")parses the v3 inline consolidated metadata from the rootzarr.json(v3 does not write the v2.zmetadatafile) as its node index, serves node metadata from that index, and fetches chunks lazily over HTTP. Scalars, axes, dense and sparse vectors, strings, bools, and dense and sparse matrices all round-trip. A legacy Zarr v2 store served over HTTP is rejected with the samepython -m zarr v2_to_v3conversion hint as the local path. -
HttpDaf/FilesDafover HTTP (the FilesFormat path,http_daf()) is unaffected; that path does not use Zarr at all.
Known limitations
- dafr reads packed/sharded v3 stores but only ever writes flat (the common default). Reading blosc/zstd-packed stores needs the optional
c-blosc/libzstdsystem libraries (see above);gzip-packed and all flat stores read with no extra dependency. Local directory (DirStore), zip (MmapZipStore,.daf.zarr.zip), and HTTP v3 stores are all supported.
dafr 0.3.1
Fix: read DataAxesFormats.jl 0.3.0 FilesFormat v1.1 directories
DataAxesFormats.jl 0.3.0 bumped the FilesDaf on-disk format from 1.0 to 1.1. The binary blobs are byte-identical, but a sparse property’s JSON sidecar moved from top-level eltype/indtype keys to per-component descriptors (nzind/nzval for vectors; colptr/rowval/nzval for matrices). dafr was a 1.0-only reader and rejected 1.1 directories outright (incompatible format version: 1.1). dafr now:
- accepts FilesFormat minor version 1 (it still writes 1.0, and reads both 1.0 and 1.1);
- parses both the legacy top-level and the v1.1 per-component sparse descriptors - deriving the element type from the
nzvalcomponent (orBoolwhen it is absent) and the index type from the index component - mirroringDataAxesFormats.jl’sparse_sparse_descriptor; - raises a clear error on 0.3.0 “packed” (
.zip, chunked + compressed) sparse components, which are not yet supported (re-save with flat components).
HttpDaf (a FilesDaf served over HTTP) gets the same treatment: it accepts v1.1, parses per-component sparse descriptors, and rejects packed components.
This covers reading flat FilesFormat 1.1 repos (directory and over HTTP) only, not the rest of 0.3.0 (the zarr/zip and “packed view of a directory as Zarr” machinery).
Fix: ZarrDaf on-disk format now interoperates with DataAxesFormats.jl
.daf.zarr stores written by dafr and by DataAxesFormats.jl were mutually unreadable. Opening a Julia-written store in R failed with missing daf.json; opening an R-written store in Julia failed with not a daf data set. Three divergences from upstream (DataAxesFormats.jl v0.2.0, src/zarr_format.jl) caused this, all now fixed:
-
dafmarker array. Upstream marks a store with a Zarr array nameddafholding twoUInt8bytes[MAJOR, MINOR]=[1, 0]and validates viahaskey(root.arrays, "daf"). dafr wrote a plaindaf.jsonfile instead. dafr now writes (and validates) thedafmarker array and no longer writesdaf.jsonfor Zarr stores. This is a breaking change to the dafr Zarr on-disk format:.daf.zarrstores written by earlier dafr versions (which carrydaf.json, not thedafarray) are not readable by this version. The FilesDafdaf.jsonmarker is unchanged. -
Intermediate
.zgroupmarkers. Upstream writes a real.zgroupfor every group (the fourscalars/axes/vectors/matricescontainers, eagerly, plus every sub-group). dafr only wrote the root.zgroupand synthesised the rest inside the consolidated.zmetadata- enough for zarr-python’s consolidated reader, but Julia’s directory-store open navigates real.zgroupfiles and so raisedKeyError: key "axes" not found. dafr now writes a.zgroupfor every group. -
Read-side dtype coverage. The chunk reader only understood
<f8/<i4/<i8/|b1/|O. It now also reads|u1,<u1,<i1,<u2,<i2,<u4,<u8, and<f4- the unsigned, narrow, and Float32 dtypes upstream legitimately emits (Float32 expression matrices, unsigned index arrays, and the|u1marker itself). -
Dense-matrix chunk separator. dafr wrote multi-dimensional chunk keys with the
/dimension separator (chunk file0/0); upstream and the Zarr v2 default use.(chunk file0.0). A Julia reader looked for0.0, did not find it, and failed withmissing chunks and no fill_value. (Sparse matrices were unaffected - their components are 1-D, whose single chunk is0either way.) dafr now uses the.separator for all arrays, matching upstream.
Round-trip interop in both directions is covered by tests/testthat/test-zarr-julia-interop.R (gated on the dafr-mcview Julia env), and zarr-python interop remains green.
Documentation pass
- Rewrote
vignette("queries")to cover element-wise transforms, compound masks (string + builder), reductions to row / column / scalar,GroupByon vectors and matrices, andIfMissingfallbacks. Previously the vignette only demonstrated trivial lookups. - Expanded
vignette("dafr")(Getting Started) withdescription(), adplyrbackend demo, a workingfiles_dafround-trip, and explicit vignette pointers for chains / views / contracts / computations. - Fixed stale “Limitations (0.1.0)” block in
vignette("anndata"). Sparse CSR / CSC, categorical columns, dense layers, andobsm/varmmatrices have all been supported since 0.2.x; the vignette now lists the actual current gaps (obsp,varp,raw, sparse layer / obsm / varm entries). -
README: dropped the “First public release: 0.1.0” status line, added Zarr and HTTP backends to Key Features, expanded the dplyr verb list, and pointed the DSL link at our ownvignette("query-dsl-reference")instead of the Julia upstream page.
Fixed (docs only)
-
vignette("dafr")persistence example calledcopy_all(d, fd)with destination and source swapped. The example was gated undereval = FALSEso the bug was invisible until the chunk was set to evaluate. Package code was correct; only the vignette was affected.
dafr 0.2.8.1
CI: fix stale Round test expectation (test-operations-registry.R:117)
Round(c(1.44, 1.55), digits = 1) returns c(1.4, 1.6) as Float64, which the default Int64 cast (Julia parity, Round 5/6) then rejects with InexactError. The test was written before that parity fix and still expected the float result. Updated to expect_error("InexactError") plus a type = "Float64" assertion that keeps the fractional-result coverage. Resolves the CI break introduced by v0.2.8.
Fix: set_matrix rejects matrices with mismatched dimnames (Round-7 G7)
set_matrix(d, "cell", "gene", "UMIs", m) with a dimnames-bearing matrix m previously discarded rownames(m) / colnames(m) silently and overwrote them with the axis entries on readback. A caller passing typo’d dimnames (e.g. c("X","Y","Z") against axis c("A","B","C")) saw no error.
Now set_matrix validates dimnames against axis entries and raises when they mismatch:
row names of the: matrix mismatch the entry names of the axis: <a>column names of the: matrix mismatch the entry names of the axis: <a>
Mirrors Julia data.jl > set_matrix > named > !rows|!columns > name.
Tests: port copies.jl > matrix > sparse > {superset, disjoint} grid (Round-7 G8)
12 new regression tests covering the sparse-matrix copy edge cases Julia exercises:
- superset rows / cols with
()(raises),empty = NULL(raises),empty = -1(fills with -1),empty = 0(fills with zero). - disjoint rows / cols with
(),empty = NULL,empty = -1(all raisedisjoint entries...).
dafr’s copy_matrix already matched Julia semantically; these guards lock the behaviour in. See tests/testthat/test-copies-sparse-grid.R.
dafr 0.2.8
Query DSL: Julia parity sweep (Round 5 + Round 6)
200 adversarial probes (Round 5) and a 1000-query grammar fuzzer (Round 6) surfaced a long tail of silent wrong-answer cases, type contracts that diverged from DataAxesFormats.jl, and error messages that had drifted out of alignment. Major user-visible changes:
-
Centralised op-invocation validator. The per-op
.reject_*dispatches scattered across five eltwise / reduction / grouped-reduction handlers are replaced by.OP_META+.validate_op_invocationinR/op_dispatch.R. The same type-tag rejection now fires at parse time before any axis / property lookup, mirroring Julia. -
integer64is demoted in eltwise and reduction dispatch so Float64-only kernels see the expected type instead of bit-aliased doubles. -
MedianpreservesNaNlike Julia rather than promoting toNA. -
Mask comparators on strings are bytewise so
"é" < "f"agrees with Julia’s lexicographic order. -
Significant high/low,Round digits,ConverttoBoolon non-{0,1},Float32sum toInt32, … now raiseInexactErrorinstead of returning silently-wrong values.ConvertandRoundaccept the full set of dtype aliases (Int8/16/32/64,UInt8/16/32/64,Float32/64,Boolplus the lowercase R-style aliases). -
IfMissingdefault is validated at parse time. Typed defaults must be in-range for the declared eltype; hex / binary literals and the named constantspi/eare accepted. -
Latin-1 / Unicode value tokens raise the same
unexpected charactererror as Julia instead of parsing through with a corrupted token. -
Bare
??without a lookup chain raisesinvalid operationrather than returning a tautology. -
GroupBy on
Boolkeys uses lowercase"true"/"false"for bucket labels. -
NaNgroup key withIfMissingfills the NaN bucket with the user default instead of dropping it. -
BeginMaskeager-rejects properties that are not matrix names so the error surfaces before the (more expensive) lookup path. -
Matrix-reduction fast paths honour the
type =parameter via.cast_to_type; previously the type cast was silently dropped on the fast path. -
% Clamplow/high is rejected at parse time. (Julia exposes this viamin/max;Clampis not a DAF operation.) -
Error-message alignment with Julia. Collapses roughly 300 cosmetic divergence buckets in the parity fuzzer; common cases:
the parameter: X does not exist for the operation: Y-
missing required parameter: X(Significant / Convert / Quantile) -
expected: value(wasexpected value after comparator at ...) - Comparator-on-non-string drops the
for the comparison operation: Xsuffix.
The Round-5 / Round-6 adversarial harness lives in dev/adversarial-parity/ (R + Julia runners, Python diff tool, 1100-query corpus, FINDINGS.md with per-bug provenance). About 55 regression tests added across tests/testthat/test-query-* and tests/testthat/test-operations-*.
Fix: cross-backend write/read parity (Round 7)
Six bugs surfaced by the new dev/backend-parity/ audit harness, which round-trips an 82-item fixture through Memory/Files/Zarr write -> reopen -> read and every (src, dst) copy_all pair. Before fixes: 11/246 (single-backend) and 35/567 (cross-backend) diverged. After: 0/246 and 0/567.
-
NaN scalars are now accepted.
set_scalar(d, "x", NaN)previously raisedvalue may not be NAbecause.assert_scalar_valueusedis.na(value), which returnsTRUEforNaN.NaNis a validFloat64per Julia DAF; only trueNAis rejected now. -
Float64 scalars round-trip at full precision on FilesDaf.
set_scalar(d, "pi_val", pi)used to read back as3.1416becausejsonlite::toJSONdefaulted todigits = 4. The scalar writer now passesdigits = 17. -
Int64/UInt64dense vectors round-trip across the full 64-bit range on FilesDaf. The reader’sreadBin(what = "integer", size = 8L)silently truncated each value to its low 32 bits (base R has no 8-byte integer type), so values whose low 32 bits were zero (2^32,2^62,-2^62, …) all came back as0. The reader now reads 8-byte doubles and bit-aliases them intointeger64. -
All-NaN Float64 vectors preserve NaN on FilesDaf. The auto-sparsifier counted NaN as zero (
sum(vec != 0, na.rm = TRUE)drops NaN), so an all-NaN vector was written as an empty sparse vector and read back as all-zero. NaN is now counted as nonzero on the sparsify decision and kept in the sparse representation. -
ZarrDaf reorders named-subset vectors to axis order.
set_vector(d, "cell", "x", c(C = 3, A = 1, B = 2))against ZarrDaf previously stored values in input order; against Memory and Files it stored in axis order..validate_vector_value(which performs the reorder) is now called in the user-facingset_vectordispatcher so every backend - current and future - receives an axis-ordered, un-named vec. -
FilesDaf scalar strings declare
Encoding() == "UTF-8". The regex fast-path in.read_scalar_jsonreturned bytes-only strings tagged"unknown". The byte content was always correct (identical()returnedTRUE), butserialize()-based comparisons distinguished the tag.
Regression tests live in tests/testthat/test-backend-parity-r7.R (one focused case per bug class). The audit harness, fixture, findings doc, and diff tool live in dev/backend-parity/.
dafr 0.2.7
Fix: row/col mask alignment under matrix GroupBy
@ axis [ prop = X ] @ other :: M -/ prop >- Op (and the GroupColumnsBy mirror) returned correct values but assigned them to the wrong group label. The mask filtered the matrix axis correctly, but the subsequent GroupRowsBy / GroupColumnsBy then fetched the group-property vector at the FULL axis length and matched it against the masked matrix by position - so the group labels and the matrix rows drifted out of alignment.
Concrete repro from the regression test:
d <- memory_daf(name = "t")
add_axis(d, "metacell", c("M1","M2","M3","M4","M5","M6"))
add_axis(d, "gene", c("G1","G2"))
add_axis(d, "type", c("A","B","C"))
set_vector(d, "metacell", "type", c("A","A","B","B","C","C"))
set_matrix(d, "metacell", "gene", "UMIs", matrix(
c(11,12, 21,22, 31,32, 41,42, 51,52, 61,62),
nrow = 6L, byrow = TRUE,
dimnames = list(c("M1","M2","M3","M4","M5","M6"), c("G1","G2"))))
# Pre-fix: returned A=72,74 (which is the M3+M4 sum, mis-labelled).
# Post-fix: B=72,74 only.
get_query(d, "@ metacell [ type = B ] @ gene :: UMIs -/ type >- Sum")Julia parity: MatrixState in DataAxesFormats.jl keeps the per-axis VectorState on the matrix, so masks and groupings are always aligned by axis. The R port was passing matrix state through a plain list that dropped the row/col indices on transition; now the matrix-lookup carries row_indices/col_indices forward and apply_groupby_rows / apply_groupby_columns subset the group vector to match.
Two regression tests added under test-query-eval-masks.R (E3 row + cols variants).
dafr 0.2.6
CI: document source = param + refresh pkgdown index
R CMD check --as-cran on all platforms flagged WARNINGs after the v0.2.5 ship; pkgdown also failed the sitrep check:
-
man/register_eltwise.Rd/man/register_reduction.Rdhad the newsource = NULLparameter (introduced as the conflict-source capture hook in v0.2.5’s CR1 fix) in\usage{}but no matching\item{source}arg block. -
_pkgdown.ymlwas missing the newly-exportedregister_query_operation()from the Op registry reference section.
Both regenerated and indexed. All man pages also re-emitted under roxygen 8.0.0 (drops \docType{data} / \format{} / \keyword{datasets} on constant exports; otherwise no behaviour change).
dafr 0.2.5
Query registry parity (CR1 closed)
register_query_operation(kind, name, fn)is now exposed as a single user-facing entry point for adding custom eltwise / reduction ops to the dafr query DSL. Mirrors Julia’sregister_query_operation()fromDataAxesFormats.Registry.-
Collision errors now match Julia’s template:
conflicting registrations for the eltwise operation: <name> first in: <file>:<line> second in: <file>:<line>Previously the error was
<name> already registered; use overwrite = TRUE. The new wording includes both registration source locations (captured automatically from the caller’s srcref, or supplied explicitly via the newsource =parameter).
dafr 0.2.4
CI: regenerate stale man pages
R CMD check on Linux devel / oldrel, Windows, and macOS hit code/documentation mismatches after the v0.2.2 / v0.2.3 ship. Two man pages were out of date with their code:
-
man/is_leaf.Rdstill documented the S7-generic signaturefunction(daf, ...); the R2 refactor in v0.2.2 madeis_leaf()a plain wrapperfunction(daf). -
man/reconstruct_axis.Rdwas missing theproperties_defaultsparameter added in v0.2.3.
Both regenerated. No behavioural changes.
dafr 0.2.3
Concat / contracts / reconstruction parity (M1 + M4 + C1 + CR3)
-
concatenate(merge = list(ALL_SCALARS = ...))wildcards now expand against the source properties at concat time. Mirrors Julia’s[ALL_SCALARS => action]/[ALL_VECTORS => action]/[ALL_MATRICES => action]. Explicit non-wildcard keys still override wildcard-expanded entries. The “can’t collect axis for the scalar:” error wording now matches Julia byte-for-byte. -
concatenate(prefixed = list(axis = c(...)))is now an override that fires regardless ofprefix[axis]. The list names cell-axis vectors whose values reference a prefixed axis and thus need the dataset name spliced in. Previously dafr ANDed the per-axis prefix flag with the list, so vectors listed inprefixed[cell]were not prefixed whenprefix[cell] == FALSE. -
merge_contracts()rejectsintegervscharacter(and other cross-lattice mixes) with Julia’sincompatible type:error. Previously dafr’s type lattice was a total order (logical < integer < double < character) so any pair quietly merged to the narrower type. Nowcharacterandinteger64are siblings to the numeric chain - same-chain merges still pick the narrower type, cross-chain merges error. -
reconstruct_axis(..., properties_defaults = list(prop = val))merges into a pre-existing axis. Implicit-property values must be a subset of the axis entries; any extra entries get the per-property default value. Mirrors Julia’sreconstruct_axis!(..., properties_defaults = (; prop = val)).
dafr 0.2.2
Reorder parity (R2 + R4 + R5 + R6 closed)
-
is_leaf()accepts S7 class objects as well as instances, mirroring Julia’sis_leaf(::Type{<:DafReader}). Class-level call returnsTRUEfor the concrete leaf classes (MemoryDaf,FilesDaf{,ReadOnly},ZarrDaf{,ReadOnly},HttpDaf) andFALSEfor the abstractDafReader/DafWriter/DafReadOnly. -
zarr_dafnow supportsreorder_axes(). Best-effort in-place reorder via the existing zarr overwrite path. Crash recovery (backup-and-restore) is not yet implemented; a mid-reorder crash leaves the store in an undefined state. -
reorder_axes(list(d1, d2), axis = perm)reorders multiple writers in one call (Julia:reorder_axes!([d1, d2], Dict(...))). Each axis must agree on entry order across every writer that has it (axis: <a> entries differerror otherwise); writers missing the axis silently skip it. -
memory_dafreorder is now atomic. A pre-reorder snapshot is parked on the daf’s internal state; aSimulatedCrashmid-reorder is rolled back byreset_reorder_axes(), which now returnsTRUEif a pending reorder was rolled back,FALSEotherwise. Mirrors Julia’s reset_reorder_axes! Bool contract.
dafr 0.2.0
Julia parity (DataAxesFormats.jl 0.2.0)
This release closes the bulk of the user-facing semantic divergences between dafr and DataAxesFormats.jl main. The R query DSL now mirrors Julia byte-for-byte at the operator level; escape_value / unescape_value use Julia’s \<char> backslash convention.
Operation parity (CO1-CO7 + CT1/CT3 closed):
-
% Op type=<T>now coerces the result. Previouslytypewas silently ignored on the numeric eltwise ops (Abs / Exp / Sqrt / Round / Log / Clamp). Honorsinteger/numeric/double/Float32/Float64/Int8-64/UInt8-64/Bool/logical. -
% Logrejects non-positivex + epswith Julia’svalue must be: positivetemplate instead of returningNaN. -
% Fractionrejects integer types (Julia:value must be: a float type) and rejects scalar input with Julia’sapplying Fraction eltwise operation to a scalarwording. -
% Significantacceptslowonly (defaultshigh = low), matching Julia. -
% GeoMeanerror wording aligned with Julia’svalue must be: not negative / for the parameter: eps / for the operation: GeoMeantemplate. -
% Abs/% Clampreject non-numerictype=with Julia’svalue must be: a number typetemplate. - All eltwise / reduction ops use the same Julia error template:
invalid value: "<v>"/value must be: <constraint>/for the parameter: <name>/for the operation: <op>.
Viewer parity (V2-V7 closed):
-
Wildcard
*in view contracts validates that values are=orNULL(Julia:(*, *),(*, prop),(axis, *)shapes). -
Scalar-shape validation: a vector-producing query in a scalar slot now errors instead of silently exposing the vector via
get_scalar. -
Strict-include semantics: passing
data = list(...)toview_daf()now exposes only the listed properties;data = NULLretains the original “expose all” behaviour. -
__axis__placeholder substitution in matrix slot queries. -
Vector-slot rejects matrix-shape queries with Julia’s
matrix query: ... / for the vector: ...wording.
Naming parity:
- Grouped reduction results (
-/,|/) now use alphabetical group ordering to match Julia’sfactor(..., levels = sort(unique)), not first-appearance order.
Recovered fixes:
-
complete_path()returnsNULLfor memory-backed dafs and non-FilesDaf chains instead of erroring; mirrors Julia’scomplete_pathreturningnothingfor non-Files dafs.complete_chain()retains its explicit error since it requires a real on-disk path. -
memory_dafacceptsMatrix::sparseVector. The atomic-only validator now detectssparseVectorvia S4 inheritance; readback densifies before name attachment. Storage roundtrip preserves values.
Test suite (5579 PASS / 0 FAIL / 142 SKIP, vs 5024 PASS / 0 FAIL / 49 SKIP at the start of the parity push):
- 23
*-jl-parity.Rfiles mirror DataAxesFormats.jl’s 22 main test files (contracts.jl is split across 7 sub-slices in R for tractable ports). - The remaining 142 skips are out-of-scope feature gaps documented per-test (
R divergencecodes), not behavioral divergences: multi-contract@computation,description(deep),empty_*builders, h5df backend, file-bridge, tensors-in-views, etc.
Earlier in 0.2.0: Julia parity for chains, concat, reorder
A literal port of DataAxesFormats.jl::concat.jl, reorder.jl, and chains.jl test suites surfaced and closed five behavior gaps:
-
concatenate(..., merge = list("axis|name" = MERGE_COLLECT_AXIS))now honorsempty=for missing-source columns. Previously, when a source lacked the vector being collect-axis-merged, that source’s column in the destination matrix was filled withNAregardless of the user’sempty=map. Now the empty fill is consulted; if neither the source noremptyprovides a value, the same “no empty value” error as the per-axis vector path is raised. -
concatenate(..., merge = list("rows|cols|name" = MERGE_LAST_VALUE))now actually fires for matrix properties (rows / cols not in the concat set). Previously a silent no-op — the dispatch reached the 3-part-key case but didn’t write anything. The destination now holds the last source’s matrix. -
reorder_axes()now errors on missing axes (was a silent skip). Aligned with Julia’sreorder_axes!contract. -
reset_reorder_axes()now returnsinvisible(TRUE/FALSE): TRUE if a pending reorder was rolled back, FALSE if no pending. Mirrors Julia’s Bool return. Existing callers that ignored the return value (most of them) are unaffected. -
ReadOnlyChainDaf/WriteChainDafversion counters now propagate from underlying sources. Previously the chain wrappers had their own private*_version_counterenv that never tracked source mutations —vector_version_counter(chain, ...)returned 0 even after a source-sideset_vectorbumped the source’s counter. More importantly, this broke cache invalidation on the chain: afterset_vector(chain, ..., overwrite = TRUE)(which routes the write to the chain’s writer), reads through the chain returned the cached pre-write value instead of the new value. The chain’s stamp / counter functions now sum per-source counters.
Windows support
files_daf() now works on Windows. Previously every write failed with MmapZipStore is not supported on Windows. On Windows, dafr skips the optional metadata.zip bundle (only used for serving a FilesDaf over HTTP via http_daf()); local reads, writes, and round-trips are unaffected. pack_files_daf_metadata() errors with a clear message, and zarr_daf() rejects .daf.zarr.zip paths there — use the unzipped .daf.zarr directory store instead, or run on Linux/macOS for zip-backed storage.
Named query results
get_query() and the format API now return named values matching the Julia NamedVector / NamedMatrix convention:
- Lookup vectors / matrices carry axis-entry names / dimnames.
- Axis listings (
@ cell,@ donor [ age > 60 ]) return character vectors withnames == values. - IfMissing-default vectors (
@ cell : missing || 0 Int64) carry names too.
This is a behavior change — code that does expect_equal(get_query(...), unnamed_vec) may need updating to expect named results. ALTREP-mmap vectors (mmap_real / mmap_int / mmap_lgl) preserve their ALTREP status across names<-, so the mmap region stays shared rather than copied.
Julia parity for get_query
Closes the remaining gaps from a literal port of the DataAxesFormats.jl::queries.jl test suite. New query forms supported:
-
Top-level comparators after
:/::return a boolean vector / matrix:@ cell : score < 1.0,@ cell @ gene :: UMIs > 0. -
Standalone
:/::withIfMissing.: age || 1 @ cell = Xreturns1whenageis missing; same for the matrix form:: UMIs || 0 @ cell = X @ gene = A. -
Implicit AsAxis fallback.
@ cell : type.manual : colorresolves through thetypeaxis whentype.manualis not itself an axis. -
Matrix-column slice auto-relayout.
@ cell :: UMIs @ gene = Aworks regardless ofUMIsstorage orientation. -
Cols-axis mask after a second axis.
@ rows @ cols [ filter ] :: Mfilters the cols axis; the matrix lookup honours both row and column filters. -
Virtual
nameproperty on every axis.[ name = X ]and: namereturn the axis-entry vector. -
Eltwise on scalar.
. score % Absapplies element-wise on numeric scalars (was:'%' eltwise requires vector or matrix in scope). -
Regex escapes in masks (
[ type ~ \^\[A-U\] ]). -
Empty-string round-trip.
escape_value("")is'';unescape_value("''")is"".
Stricter error reporting:
- Partial queries (
@ cell @ genewith no lookup) error withinvalid query: <canonical>instead of silently returningNULL. A second?after aNamesresult also errors. - Empty-matrix reductions without
IfMissingalways error (was: the output-axis-empty branch silently returned an empty vector). - Numeric reductions on a character matrix error with
non-numeric inputinstead of leaking base R’s'x' must be numeric. -
?? sentinel : propraises a clear parse error when the sentinel can’t be coerced to the lookup vector’s type (was: silentNAvia R’sas.integerwarning). - Parser errors for unknown operations / parameters and repeated parameters now match the Julia DAF wording.
-
query_axis_name()agrees withget_query()on compound-mask queries (@ cell [ is_low & UMIs @ gene = B ]).
Reduction builders
Sum(), Mean(), Median(), Min(), Max(), Mode(), Count(), GeoMean(), Quantile(), Std(), StdN(), Var(), VarN() now produce the canonical reduction form (>> Sum) instead of % Sum. The previous emission was an element-wise op that erred at runtime when piped after a matrix or vector. Behavior change: stored canonical strings from these builders change from % Sum to >> Sum. ReduceToColumn() / ReduceToRow() accept both shapes for back-compat. canonical_query() also accepts a DafrQuery directly.
>> Mode / >| Mode now accept character and factor inputs, matching the Julia operation’s documented support for strings.
IfMissing defaults in vector and matrix lookups thread through the default coercion so : age || 1 returns an integer column (not a character one).
get_dataframe() column-spec
get_dataframe() and get_dataframe_query() accept a list mixing positional bare names with name = ":query" pairs:
get_dataframe(d, "cell", columns = list("age", doublet = ":is_doublet"))Mirrors Julia’s ["age", "doublet" => ":is_doublet"].
Mask comparators on factor properties
[ prop < value ], [ prop > value ], etc. on a property stored as a factor (e.g. an h5ad categorical loaded via categorical encoding) now compare the stored strings lexically, matching Julia. Previously returned NA (unordered factor) or compared level codes (ordered factor).
dafr 0.2.0
Reader-API parity polish
-
description(daf)now emits per-format header lines (url:forHttpDaf,path:+mode:forFilesDaf/ZarrDaf) after thename:/type:lines. New internalformat_description_headergeneric mirrors upstreamFormats.format_description_header; the default emits justtype: <ClassName>, per-format methods extend it. - New exported
is_leaf(daf)predicate. ReturnsTRUEfor storage formats that own their state directly (MemoryDaf,FilesDaf/FilesDafReadOnly,ZarrDaf/ZarrDafReadOnly,HttpDaf) andFALSEfor wrappers (ReadOnlyChainDaf,WriteChainDaf,ContractDaf,ViewDaf). Mirrors upstreamReaders.is_leaf. -
reorder_axes()now rejects non-leaf inputs up front with a clear"non-leaf type: <Class> for the daf data: <name> given to reorder_axes"error (previously surfaced as a cryptic missing-method dispatch). -
complete_path()now works forZarrDaf(returns the directory path,:memory:, zip path, or HTTP URL — whichever store path the constructor recorded). Was previouslyFilesDaf-only.
HttpDaf + HttpStore + metadata.zip parity
- New
HttpDafbackend for read-only access to aFilesDafdirectory served over HTTP(S). The client downloadsmetadata.ziponce at open and serves all JSON metadata from it; non-JSON payloads (.txt/.data/.nzind/.nzval/.colptr/.rowval/.nztxt) are fetched lazily via one HTTP GET each. - New
HttpStore(R/http_store.R) implements theR/zarr_store.Rstore interface over HTTP.zarr_daf("https://host/foo.daf.zarr/")routes through it; reads.zmetadataonce and serves.zarray/.zattrs/.zgroupfrom there. -
open_daf("https://...")dispatches tohttp_daforzarr_dafbased on the URL suffix. HTTP backends are read-only; writable modes hard-error.*.daf.zarr.zipURLs are explicitly out of scope and redirect users to opening the underlying.daf.zarrdirectory. - New
pack_files_daf_metadata(path)exported helper to bundle aFilesDaftree’s JSON metadata intometadata.zip(for trees written by older dafr or modified outside dafr). -
FilesDafnow maintainsmetadata.zipautomatically on everyset_*/delete_*/add_axis/delete_axis/reorder_axesoperation, plus a one-shot rebuild on writable open if the bundle is missing. Mirrors upstreamDataAxesFormats.jl::FilesFormat. Pre-0.2.0 stores are picked up automatically the first time they’re opened with mode"r+"or"w+". -
axes/metadata.jsonsidecar now maintained by FilesDaf (sorted JSON array of axis names). Required by HTTP clients to enumerate axes without GET-ing everyaxes/*.txt. - New
Imports:httr2(and transitivelycurl).
MmapZipStore + Zarr zip backend
- New
MmapZipStore(C++ insrc/mmap_zip_store.cpp) backsZarrDafwith a single ZIP archive on the local filesystem.open_daf()andzarr_daf()now accept.daf.zarr.zippaths and return a workingZarrDaf/ZarrDafReadOnly. - Reads use a shared mmap of the archive: stored (method-0) entries are returned as zero-copy
ALTREP RAWviews via a newZipRawAltrepclass. Deflate-compressed (method-8) entries are decompressed on demand via system zlib; deflate64 / other methods raise a clear error pointing to a stored / deflate re-save. - Writes append entries via upstream’s two-step commit protocol (commit central directory + EOCD first, then write the local file header and data into the now-sparse hole). Crash-safe: a writable open’s recovery pass detects partial commits via tail validation (LFH signature + data CRC32) and rolls back the trailing run of invalid entries before returning. Internal tick-counter hooks at every commit-able decision point let recovery be tested deterministically (5 tick points; tests gated on
NOT_CRAN=true). - Always emits ZIP64 (per upstream
DataAxesFormats.jl); every local file header is padded with a0xDAF1extra field so the data region starts at an 8-byte-aligned file offset (zero-copy unaligned-load safety on every host architecture). - ALTREP safety net: when the store closes (or is GC’d), every outstanding ALTREP vector it produced is deactivated —
length()returns 0 andDataptr()returns a stable inert byte. R callers who keep references past close get clean empty raws instead of segfaults. - Internal-only
dafr:::dafr_mmap_zip_reserve()/dafr:::dafr_mmap_zip_patch_crc()expose two-phase fill for large sparse arrays (writable in-place ALTREP view + post-fill CRC patch). Crash between reserve and patch rolls back via the same CRC-mismatch path as ordinary partial commits. -
SystemRequirements: zlib (linked via-lz). - Cross-language smoke: dafr-written
.daf.zarr.ziparchives open cleanly in Python viazipfileandzarr.open(zarr.storage.ZipStore(...)). Foreign zips written bypython -m zipfile(stored or deflate) open cleanly in dafr. - Mirrors
DataAxesFormats.jlmmap_zip_store.jl(~1070 LOC of Julia ported to ~1300 LOC of C++ + cpp11 + ALTREP).
ZarrDaf backend
- New
zarr_daf(uri, mode, name)backend reading and writing Zarr v2. Two store impls:DirStore(filesystem directory tree) andDictStore(in-memory). Zip-backed Zarr is also supported via theMmapZipStorebackend (see above). - New
files_to_zarr(src, dst)andzarr_to_files(src, dst)conversion helpers (same-filesystem only; correctness-first implementation re-encodes through the public API; hard-link optimization deferred as a perf follow-up). -
open_daf("foo.daf.zarr")now returns aZarrDaf. - Compression policy: dafr writes Zarr chunks uncompressed; reads uncompressed and gzip; rejects blosc/zstd/lz4 with a clear error pointing to re-save with
compressor=None. - Sparse layouts mirror upstream
DataAxesFormats.jl: 1-based on-disk indices fornzind/colptr/rowval; sparse-Bool all-TRUEskipsnzval(storage compaction). Cross-language parity is verified via gated Pythonzarr.open()smoke tests. - Mirrors
DataAxesFormats.jlv0.2.0 commitsea4b5f9(Zarr v2 directory tree),8cc3ff6(in-memory store),47e7693(CRC fix — N/A for our in-memory layer),79034fd(.zmetadataconsolidation),46d4ab2(Files↔︎Zarr conversion).
reorder_axes() + open_daf() factory
- New
reorder_axes(daf, axis = perm, ...)permutes axis entries in place, rewriting every vector and matrix that depends on the axis. Onfiles_dafthe operation is crash-recoverable via a.reorder.backup/directory of hardlinks; on the nextfiles_daf(path, mode = "r+" | "w+")open, any in-progress reorder is automatically rolled back to the pre-reorder state. - New
reset_reorder_axes(daf)to manually trigger recovery (mostly redundant given the auto-recovery on open). - New
open_daf(uri, mode, name)factory function — dispatches on path / URL pattern.memory://(or no path) →memory_daf, filesystem path →files_daf,*.daf.zarr/*.daf.zarr.zip→zarr_daf,http(s)://→http_daf. The factory replaces the previous filesystem-onlyopen_daffromR/complete.R. - Mirrors
DataAxesFormats.jlv0.2.0 commits90301ff,070bd34(axis reordering) andb40377f(open_daffactory).
Internal: per-item cache_group refactor
The internal format API now returns per-item cache classifications, matching DataAxesFormats.jl v0.2.0 (upstream commit 49fbba1). No user-visible behavior change.
- Every backend
format_get_*method (scalar/axis_array/vector/matrix) returnslist(value, cache_group)instead of a bare value. - Every backend
format_set_*method returns the cache_group constant for the just-written value (orNULL) instead ofinvisible(). - New exported character constants
MEMORY_DATA,MAPPED_DATA,QUERY_DATA— accepted byempty_cache(daf, clear = ...)/keep = ...alongside the existing lowercase forms. - The reader-level cache (
R/readers.R) now consults the backend-returned cache_group when storing fresh reads, instead of hardcoding the"memory"tier. mmap-eligible reads onfiles_dafnow correctly land in the"mapped"tier. - Per-item classification:
files_dafreturnsMEMORY_DATAfor string/factor reads (R’s CHARSXP cache makes mmap moot for strings) andMAPPED_DATAfor everything else. Matches upstream’s structural classification — no size thresholds.
This refactor is preparatory for the ZarrDaf and HttpDaf backends, which require per-item classification to drive their internal caching.
dafr 0.1.0 (development)
Query DSL: Julia-parity parser additions
Three DataAxesFormats.jl query-DSL features that were previously missing have been implemented, closing the last semantic gaps between dafr and the upstream Julia package:
-
>> Reductionreduces a vector or matrix to a scalar (e.g.@ gene : is_lateral >> Sum type Int64, or@ cell @ gene :: UMIs >> Sum).>>is no longer silently aliased to>|; on a grouped input (... / g >> Sum) it continues to produce a per-group vector, as before. -
@ axis = entrypicks one entry from a vector (@ cell : age @ cell = N89) or one cell from a matrix (@ cell @ gene :: UMIs @ cell = C @ gene = X). Two successive picks collapse a matrix to a scalar. -
|| value type Tattaches a Julia-style dtype (Bool,Int8..Int64,UInt8..UInt64,Float32,Float64,String) to a scalar-lookup default, matching the existing behaviour of the same suffix inside reductions and element-wise ops. -
IfMissing()builder gains an optionaltypekwarg (IfMissing(0, type = "Int64")).
dafr 0.1.0
First public release.
A native R + C++ implementation of the DataAxesFormats (DAF) data model for multi-dimensional data along arbitrary axes, ported from the Julia reference implementation with no Julia dependency.
Features
-
Core model. Scalars, per-axis vectors, per-axis-pair matrices, axis entries, cache invalidation.
memory_daf()andfiles_daf()backends. -
Query DSL. String form (
daf["@ cell : donor"]) and pipe-chain builders (daf[Axis("cell") |> LookupVector("donor") |> IsGreater(2)]). -
Memory-mapped reads for vectors and sparse matrices from a read-only
files_daf()store (zero-copy). -
OpenMP-parallel C++ kernels for Sum / Mean / Var / Mode / Quantile / GeoMean, with a CRAN-compliant 2-core auto-cap via
set_num_threads(). -
AnnData interop.
DafAnnDataR6 facade,as_anndata(),h5ad_as_daf(),daf_as_h5ad()— sparseX, categoricalobs/var, nesteduns, andobsm/varmall round-trip. -
dplyr backend.
tbl(daf, axis)→ lazydaf_axis_tblwithfilter,select,mutate,arrange,summarise,group_by,distinct,pull,collect,compute(write-back). -
Contracts.
create_contract(),verify_contract(),contract_scalar()/contract_vector()/contract_matrix()/tensor_contract()/axis_contract()for computation pre/post-condition validation.
See the pkgdown site for full documentation.