Changelog
Source:NEWS.md
dafr 0.2.0 (in development)
Windows support
files_daf() now works on Windows. Previously every write failed with MmapZipStore is not supported on Windows. On Windows, dafr skips the optional metadata.zip bundle (only used for serving a FilesDaf over HTTP via http_daf()); local reads, writes, and round-trips are unaffected. pack_files_daf_metadata() errors with a clear message, and zarr_daf() rejects .daf.zarr.zip paths there — use the unzipped .daf.zarr directory store instead, or run on Linux/macOS for zip-backed storage.
Named query results
get_query() and the format API now return named values matching the Julia NamedVector / NamedMatrix convention:
- Lookup vectors / matrices carry axis-entry names / dimnames.
- Axis listings (
@ cell,@ donor [ age > 60 ]) return character vectors withnames == values. - IfMissing-default vectors (
@ cell : missing || 0 Int64) carry names too.
This is a behavior change — code that does expect_equal(get_query(...), unnamed_vec) may need updating to expect named results. ALTREP-mmap vectors (mmap_real / mmap_int / mmap_lgl) preserve their ALTREP status across names<-, so the mmap region stays shared rather than copied.
Julia parity for get_query
Closes the remaining gaps from a literal port of the DataAxesFormats.jl::queries.jl test suite. New query forms supported:
-
Top-level comparators after
:/::return a boolean vector / matrix:@ cell : score < 1.0,@ cell @ gene :: UMIs > 0. -
Standalone
:/::withIfMissing.: age || 1 @ cell = Xreturns1whenageis missing; same for the matrix form:: UMIs || 0 @ cell = X @ gene = A. -
Implicit AsAxis fallback.
@ cell : type.manual : colorresolves through thetypeaxis whentype.manualis not itself an axis. -
Matrix-column slice auto-relayout.
@ cell :: UMIs @ gene = Aworks regardless ofUMIsstorage orientation. -
Cols-axis mask after a second axis.
@ rows @ cols [ filter ] :: Mfilters the cols axis; the matrix lookup honours both row and column filters. -
Virtual
nameproperty on every axis.[ name = X ]and: namereturn the axis-entry vector. -
Eltwise on scalar.
. score % Absapplies element-wise on numeric scalars (was:'%' eltwise requires vector or matrix in scope). -
Regex escapes in masks (
[ type ~ \^\[A-U\] ]). -
Empty-string round-trip.
escape_value("")is'';unescape_value("''")is"".
Stricter error reporting:
- Partial queries (
@ cell @ genewith no lookup) error withinvalid query: <canonical>instead of silently returningNULL. A second?after aNamesresult also errors. - Empty-matrix reductions without
IfMissingalways error (was: the output-axis-empty branch silently returned an empty vector). - Numeric reductions on a character matrix error with
non-numeric inputinstead of leaking base R’s'x' must be numeric. -
?? sentinel : propraises a clear parse error when the sentinel can’t be coerced to the lookup vector’s type (was: silentNAvia R’sas.integerwarning). - Parser errors for unknown operations / parameters and repeated parameters now match the Julia DAF wording.
-
query_axis_name()agrees withget_query()on compound-mask queries (@ cell [ is_low & UMIs @ gene = B ]).
Reduction builders
Sum(), Mean(), Median(), Min(), Max(), Mode(), Count(), GeoMean(), Quantile(), Std(), StdN(), Var(), VarN() now produce the canonical reduction form (>> Sum) instead of % Sum. The previous emission was an element-wise op that erred at runtime when piped after a matrix or vector. Behavior change: stored canonical strings from these builders change from % Sum to >> Sum. ReduceToColumn() / ReduceToRow() accept both shapes for back-compat. canonical_query() also accepts a DafrQuery directly.
>> Mode / >| Mode now accept character and factor inputs, matching the Julia operation’s documented support for strings.
IfMissing defaults in vector and matrix lookups thread through the default coercion so : age || 1 returns an integer column (not a character one).
get_dataframe() column-spec
get_dataframe() and get_dataframe_query() accept a list mixing positional bare names with name = ":query" pairs:
get_dataframe(d, "cell", columns = list("age", doublet = ":is_doublet"))Mirrors Julia’s ["age", "doublet" => ":is_doublet"].
Mask comparators on factor properties
[ prop < value ], [ prop > value ], etc. on a property stored as a factor (e.g. an h5ad categorical loaded via categorical encoding) now compare the stored strings lexically, matching Julia. Previously returned NA (unordered factor) or compared level codes (ordered factor).
dafr 0.2.0
Reader-API parity polish
-
description(daf)now emits per-format header lines (url:forHttpDaf,path:+mode:forFilesDaf/ZarrDaf) after thename:/type:lines. New internalformat_description_headergeneric mirrors upstreamFormats.format_description_header; the default emits justtype: <ClassName>, per-format methods extend it. - New exported
is_leaf(daf)predicate. ReturnsTRUEfor storage formats that own their state directly (MemoryDaf,FilesDaf/FilesDafReadOnly,ZarrDaf/ZarrDafReadOnly,HttpDaf) andFALSEfor wrappers (ReadOnlyChainDaf,WriteChainDaf,ContractDaf,ViewDaf). Mirrors upstreamReaders.is_leaf. -
reorder_axes()now rejects non-leaf inputs up front with a clear"non-leaf type: <Class> for the daf data: <name> given to reorder_axes"error (previously surfaced as a cryptic missing-method dispatch). -
complete_path()now works forZarrDaf(returns the directory path,:memory:, zip path, or HTTP URL — whichever store path the constructor recorded). Was previouslyFilesDaf-only.
HttpDaf + HttpStore + metadata.zip parity
- New
HttpDafbackend for read-only access to aFilesDafdirectory served over HTTP(S). The client downloadsmetadata.ziponce at open and serves all JSON metadata from it; non-JSON payloads (.txt/.data/.nzind/.nzval/.colptr/.rowval/.nztxt) are fetched lazily via one HTTP GET each. - New
HttpStore(R/http_store.R) implements theR/zarr_store.Rstore interface over HTTP.zarr_daf("https://host/foo.daf.zarr/")routes through it; reads.zmetadataonce and serves.zarray/.zattrs/.zgroupfrom there. -
open_daf("https://...")dispatches tohttp_daforzarr_dafbased on the URL suffix. HTTP backends are read-only; writable modes hard-error.*.daf.zarr.zipURLs are explicitly out of scope and redirect users to opening the underlying.daf.zarrdirectory. - New
pack_files_daf_metadata(path)exported helper to bundle aFilesDaftree’s JSON metadata intometadata.zip(for trees written by older dafr or modified outside dafr). -
FilesDafnow maintainsmetadata.zipautomatically on everyset_*/delete_*/add_axis/delete_axis/reorder_axesoperation, plus a one-shot rebuild on writable open if the bundle is missing. Mirrors upstreamDataAxesFormats.jl::FilesFormat. Pre-0.2.0 stores are picked up automatically the first time they’re opened with mode"r+"or"w+". -
axes/metadata.jsonsidecar now maintained by FilesDaf (sorted JSON array of axis names). Required by HTTP clients to enumerate axes without GET-ing everyaxes/*.txt. - New
Imports:httr2(and transitivelycurl).
MmapZipStore + Zarr zip backend
- New
MmapZipStore(C++ insrc/mmap_zip_store.cpp) backsZarrDafwith a single ZIP archive on the local filesystem.open_daf()andzarr_daf()now accept.daf.zarr.zippaths and return a workingZarrDaf/ZarrDafReadOnly. - Reads use a shared mmap of the archive: stored (method-0) entries are returned as zero-copy
ALTREP RAWviews via a newZipRawAltrepclass. Deflate-compressed (method-8) entries are decompressed on demand via system zlib; deflate64 / other methods raise a clear error pointing to a stored / deflate re-save. - Writes append entries via upstream’s two-step commit protocol (commit central directory + EOCD first, then write the local file header and data into the now-sparse hole). Crash-safe: a writable open’s recovery pass detects partial commits via tail validation (LFH signature + data CRC32) and rolls back the trailing run of invalid entries before returning. Internal tick-counter hooks at every commit-able decision point let recovery be tested deterministically (5 tick points; tests gated on
NOT_CRAN=true). - Always emits ZIP64 (per upstream
DataAxesFormats.jl); every local file header is padded with a0xDAF1extra field so the data region starts at an 8-byte-aligned file offset (zero-copy unaligned-load safety on every host architecture). - ALTREP safety net: when the store closes (or is GC’d), every outstanding ALTREP vector it produced is deactivated —
length()returns 0 andDataptr()returns a stable inert byte. R callers who keep references past close get clean empty raws instead of segfaults. - Internal-only
dafr:::dafr_mmap_zip_reserve()/dafr:::dafr_mmap_zip_patch_crc()expose two-phase fill for large sparse arrays (writable in-place ALTREP view + post-fill CRC patch). Crash between reserve and patch rolls back via the same CRC-mismatch path as ordinary partial commits. -
SystemRequirements: zlib (linked via-lz). - Cross-language smoke: dafr-written
.daf.zarr.ziparchives open cleanly in Python viazipfileandzarr.open(zarr.storage.ZipStore(...)). Foreign zips written bypython -m zipfile(stored or deflate) open cleanly in dafr. - Mirrors
DataAxesFormats.jlmmap_zip_store.jl(~1070 LOC of Julia ported to ~1300 LOC of C++ + cpp11 + ALTREP).
ZarrDaf backend
- New
zarr_daf(uri, mode, name)backend reading and writing Zarr v2. Two store impls:DirStore(filesystem directory tree) andDictStore(in-memory). Zip-backed Zarr is also supported via theMmapZipStorebackend (see above). - New
files_to_zarr(src, dst)andzarr_to_files(src, dst)conversion helpers (same-filesystem only; correctness-first implementation re-encodes through the public API; hard-link optimization deferred as a perf follow-up). -
open_daf("foo.daf.zarr")now returns aZarrDaf. - Compression policy: dafr writes Zarr chunks uncompressed; reads uncompressed and gzip; rejects blosc/zstd/lz4 with a clear error pointing to re-save with
compressor=None. - Sparse layouts mirror upstream
DataAxesFormats.jl: 1-based on-disk indices fornzind/colptr/rowval; sparse-Bool all-TRUEskipsnzval(storage compaction). Cross-language parity is verified via gated Pythonzarr.open()smoke tests. - Mirrors
DataAxesFormats.jlv0.2.0 commitsea4b5f9(Zarr v2 directory tree),8cc3ff6(in-memory store),47e7693(CRC fix — N/A for our in-memory layer),79034fd(.zmetadataconsolidation),46d4ab2(Files↔︎Zarr conversion).
reorder_axes() + open_daf() factory
- New
reorder_axes(daf, axis = perm, ...)permutes axis entries in place, rewriting every vector and matrix that depends on the axis. Onfiles_dafthe operation is crash-recoverable via a.reorder.backup/directory of hardlinks; on the nextfiles_daf(path, mode = "r+" | "w+")open, any in-progress reorder is automatically rolled back to the pre-reorder state. - New
reset_reorder_axes(daf)to manually trigger recovery (mostly redundant given the auto-recovery on open). - New
open_daf(uri, mode, name)factory function — dispatches on path / URL pattern.memory://(or no path) →memory_daf, filesystem path →files_daf,*.daf.zarr/*.daf.zarr.zip→zarr_daf,http(s)://→http_daf. The factory replaces the previous filesystem-onlyopen_daffromR/complete.R. - Mirrors
DataAxesFormats.jlv0.2.0 commits90301ff,070bd34(axis reordering) andb40377f(open_daffactory).
Internal: per-item cache_group refactor
The internal format API now returns per-item cache classifications, matching DataAxesFormats.jl v0.2.0 (upstream commit 49fbba1). No user-visible behavior change.
- Every backend
format_get_*method (scalar/axis_array/vector/matrix) returnslist(value, cache_group)instead of a bare value. - Every backend
format_set_*method returns the cache_group constant for the just-written value (orNULL) instead ofinvisible(). - New exported character constants
MEMORY_DATA,MAPPED_DATA,QUERY_DATA— accepted byempty_cache(daf, clear = ...)/keep = ...alongside the existing lowercase forms. - The reader-level cache (
R/readers.R) now consults the backend-returned cache_group when storing fresh reads, instead of hardcoding the"memory"tier. mmap-eligible reads onfiles_dafnow correctly land in the"mapped"tier. - Per-item classification:
files_dafreturnsMEMORY_DATAfor string/factor reads (R’s CHARSXP cache makes mmap moot for strings) andMAPPED_DATAfor everything else. Matches upstream’s structural classification — no size thresholds.
This refactor is preparatory for the ZarrDaf and HttpDaf backends, which require per-item classification to drive their internal caching.
dafr 0.1.0 (development)
Query DSL: Julia-parity parser additions
Three DataAxesFormats.jl query-DSL features that were previously missing have been implemented, closing the last semantic gaps between dafr and the upstream Julia package:
-
>> Reductionreduces a vector or matrix to a scalar (e.g.@ gene : is_lateral >> Sum type Int64, or@ cell @ gene :: UMIs >> Sum).>>is no longer silently aliased to>|; on a grouped input (... / g >> Sum) it continues to produce a per-group vector, as before. -
@ axis = entrypicks one entry from a vector (@ cell : age @ cell = N89) or one cell from a matrix (@ cell @ gene :: UMIs @ cell = C @ gene = X). Two successive picks collapse a matrix to a scalar. -
|| value type Tattaches a Julia-style dtype (Bool,Int8..Int64,UInt8..UInt64,Float32,Float64,String) to a scalar-lookup default, matching the existing behaviour of the same suffix inside reductions and element-wise ops. -
IfMissing()builder gains an optionaltypekwarg (IfMissing(0, type = "Int64")).
dafr 0.1.0
First public release.
A native R + C++ implementation of the DataAxesFormats (DAF) data model for multi-dimensional data along arbitrary axes, ported from the Julia reference implementation with no Julia dependency.
Features
-
Core model. Scalars, per-axis vectors, per-axis-pair matrices, axis entries, cache invalidation.
memory_daf()andfiles_daf()backends. -
Query DSL. String form (
daf["@ cell : donor"]) and pipe-chain builders (daf[Axis("cell") |> LookupVector("donor") |> IsGreater(2)]). -
Memory-mapped reads for vectors and sparse matrices from a read-only
files_daf()store (zero-copy). -
OpenMP-parallel C++ kernels for Sum / Mean / Var / Mode / Quantile / GeoMean, with a CRAN-compliant 2-core auto-cap via
set_num_threads(). -
AnnData interop.
DafAnnDataR6 facade,as_anndata(),h5ad_as_daf(),daf_as_h5ad()— sparseX, categoricalobs/var, nesteduns, andobsm/varmall round-trip. -
dplyr backend.
tbl(daf, axis)→ lazydaf_axis_tblwithfilter,select,mutate,arrange,summarise,group_by,distinct,pull,collect,compute(write-back). -
Contracts.
create_contract(),verify_contract(),contract_scalar()/contract_vector()/contract_matrix()/tensor_contract()/axis_contract()for computation pre/post-condition validation.
See the pkgdown site for full documentation.