Skip to contents

dafr 0.2.0 (in development)

Windows support

files_daf() now works on Windows. Previously every write failed with MmapZipStore is not supported on Windows. On Windows, dafr skips the optional metadata.zip bundle (only used for serving a FilesDaf over HTTP via http_daf()); local reads, writes, and round-trips are unaffected. pack_files_daf_metadata() errors with a clear message, and zarr_daf() rejects .daf.zarr.zip paths there — use the unzipped .daf.zarr directory store instead, or run on Linux/macOS for zip-backed storage.

Named query results

get_query() and the format API now return named values matching the Julia NamedVector / NamedMatrix convention:

  • Lookup vectors / matrices carry axis-entry names / dimnames.
  • Axis listings (@ cell, @ donor [ age > 60 ]) return character vectors with names == values.
  • IfMissing-default vectors (@ cell : missing || 0 Int64) carry names too.

This is a behavior change — code that does expect_equal(get_query(...), unnamed_vec) may need updating to expect named results. ALTREP-mmap vectors (mmap_real / mmap_int / mmap_lgl) preserve their ALTREP status across names<-, so the mmap region stays shared rather than copied.

Julia parity for get_query

Closes the remaining gaps from a literal port of the DataAxesFormats.jl::queries.jl test suite. New query forms supported:

  • Top-level comparators after : / :: return a boolean vector / matrix: @ cell : score < 1.0, @ cell @ gene :: UMIs > 0.
  • Standalone : / :: with IfMissing. : age || 1 @ cell = X returns 1 when age is missing; same for the matrix form :: UMIs || 0 @ cell = X @ gene = A.
  • Implicit AsAxis fallback. @ cell : type.manual : color resolves through the type axis when type.manual is not itself an axis.
  • Matrix-column slice auto-relayout. @ cell :: UMIs @ gene = A works regardless of UMIs storage orientation.
  • Cols-axis mask after a second axis. @ rows @ cols [ filter ] :: M filters the cols axis; the matrix lookup honours both row and column filters.
  • Virtual name property on every axis. [ name = X ] and : name return the axis-entry vector.
  • Eltwise on scalar. . score % Abs applies element-wise on numeric scalars (was: '%' eltwise requires vector or matrix in scope).
  • Regex escapes in masks ([ type ~ \^\[A-U\] ]).
  • Empty-string round-trip. escape_value("") is ''; unescape_value("''") is "".

Stricter error reporting:

  • Partial queries (@ cell @ gene with no lookup) error with invalid query: <canonical> instead of silently returning NULL. A second ? after a Names result also errors.
  • Empty-matrix reductions without IfMissing always error (was: the output-axis-empty branch silently returned an empty vector).
  • Numeric reductions on a character matrix error with non-numeric input instead of leaking base R’s 'x' must be numeric.
  • ?? sentinel : prop raises a clear parse error when the sentinel can’t be coerced to the lookup vector’s type (was: silent NA via R’s as.integer warning).
  • Parser errors for unknown operations / parameters and repeated parameters now match the Julia DAF wording.
  • query_axis_name() agrees with get_query() on compound-mask queries (@ cell [ is_low & UMIs @ gene = B ]).

Reduction builders

Sum(), Mean(), Median(), Min(), Max(), Mode(), Count(), GeoMean(), Quantile(), Std(), StdN(), Var(), VarN() now produce the canonical reduction form (>> Sum) instead of % Sum. The previous emission was an element-wise op that erred at runtime when piped after a matrix or vector. Behavior change: stored canonical strings from these builders change from % Sum to >> Sum. ReduceToColumn() / ReduceToRow() accept both shapes for back-compat. canonical_query() also accepts a DafrQuery directly.

>> Mode / >| Mode now accept character and factor inputs, matching the Julia operation’s documented support for strings.

IfMissing defaults in vector and matrix lookups thread through the default coercion so : age || 1 returns an integer column (not a character one).

get_dataframe() column-spec

get_dataframe() and get_dataframe_query() accept a list mixing positional bare names with name = ":query" pairs:

get_dataframe(d, "cell", columns = list("age", doublet = ":is_doublet"))

Mirrors Julia’s ["age", "doublet" => ":is_doublet"].

Mask comparators on factor properties

[ prop < value ], [ prop > value ], etc. on a property stored as a factor (e.g. an h5ad categorical loaded via categorical encoding) now compare the stored strings lexically, matching Julia. Previously returned NA (unordered factor) or compared level codes (ordered factor).

Build hygiene

.Rbuildignore excludes AGENTS.md and CLAUDE.md so they no longer surface as a top-level files NOTE under R CMD check.

dafr 0.2.0

Reader-API parity polish

  • description(daf) now emits per-format header lines (url: for HttpDaf, path: + mode: for FilesDaf / ZarrDaf) after the name: / type: lines. New internal format_description_header generic mirrors upstream Formats.format_description_header; the default emits just type: <ClassName>, per-format methods extend it.
  • New exported is_leaf(daf) predicate. Returns TRUE for storage formats that own their state directly (MemoryDaf, FilesDaf / FilesDafReadOnly, ZarrDaf / ZarrDafReadOnly, HttpDaf) and FALSE for wrappers (ReadOnlyChainDaf, WriteChainDaf, ContractDaf, ViewDaf). Mirrors upstream Readers.is_leaf.
  • reorder_axes() now rejects non-leaf inputs up front with a clear "non-leaf type: <Class> for the daf data: <name> given to reorder_axes" error (previously surfaced as a cryptic missing-method dispatch).
  • complete_path() now works for ZarrDaf (returns the directory path, :memory:, zip path, or HTTP URL — whichever store path the constructor recorded). Was previously FilesDaf-only.

HttpDaf + HttpStore + metadata.zip parity

  • New HttpDaf backend for read-only access to a FilesDaf directory served over HTTP(S). The client downloads metadata.zip once at open and serves all JSON metadata from it; non-JSON payloads (.txt / .data / .nzind / .nzval / .colptr / .rowval / .nztxt) are fetched lazily via one HTTP GET each.
  • New HttpStore (R/http_store.R) implements the R/zarr_store.R store interface over HTTP. zarr_daf("https://host/foo.daf.zarr/") routes through it; reads .zmetadata once and serves .zarray/.zattrs/.zgroup from there.
  • open_daf("https://...") dispatches to http_daf or zarr_daf based on the URL suffix. HTTP backends are read-only; writable modes hard-error. *.daf.zarr.zip URLs are explicitly out of scope and redirect users to opening the underlying .daf.zarr directory.
  • New pack_files_daf_metadata(path) exported helper to bundle a FilesDaf tree’s JSON metadata into metadata.zip (for trees written by older dafr or modified outside dafr).
  • FilesDaf now maintains metadata.zip automatically on every set_* / delete_* / add_axis / delete_axis / reorder_axes operation, plus a one-shot rebuild on writable open if the bundle is missing. Mirrors upstream DataAxesFormats.jl::FilesFormat. Pre-0.2.0 stores are picked up automatically the first time they’re opened with mode "r+" or "w+".
  • axes/metadata.json sidecar now maintained by FilesDaf (sorted JSON array of axis names). Required by HTTP clients to enumerate axes without GET-ing every axes/*.txt.
  • New Imports: httr2 (and transitively curl).

MmapZipStore + Zarr zip backend

  • New MmapZipStore (C++ in src/mmap_zip_store.cpp) backs ZarrDaf with a single ZIP archive on the local filesystem. open_daf() and zarr_daf() now accept .daf.zarr.zip paths and return a working ZarrDaf / ZarrDafReadOnly.
  • Reads use a shared mmap of the archive: stored (method-0) entries are returned as zero-copy ALTREP RAW views via a new ZipRawAltrep class. Deflate-compressed (method-8) entries are decompressed on demand via system zlib; deflate64 / other methods raise a clear error pointing to a stored / deflate re-save.
  • Writes append entries via upstream’s two-step commit protocol (commit central directory + EOCD first, then write the local file header and data into the now-sparse hole). Crash-safe: a writable open’s recovery pass detects partial commits via tail validation (LFH signature + data CRC32) and rolls back the trailing run of invalid entries before returning. Internal tick-counter hooks at every commit-able decision point let recovery be tested deterministically (5 tick points; tests gated on NOT_CRAN=true).
  • Always emits ZIP64 (per upstream DataAxesFormats.jl); every local file header is padded with a 0xDAF1 extra field so the data region starts at an 8-byte-aligned file offset (zero-copy unaligned-load safety on every host architecture).
  • ALTREP safety net: when the store closes (or is GC’d), every outstanding ALTREP vector it produced is deactivated — length() returns 0 and Dataptr() returns a stable inert byte. R callers who keep references past close get clean empty raws instead of segfaults.
  • Internal-only dafr:::dafr_mmap_zip_reserve() / dafr:::dafr_mmap_zip_patch_crc() expose two-phase fill for large sparse arrays (writable in-place ALTREP view + post-fill CRC patch). Crash between reserve and patch rolls back via the same CRC-mismatch path as ordinary partial commits.
  • SystemRequirements: zlib (linked via -lz).
  • Cross-language smoke: dafr-written .daf.zarr.zip archives open cleanly in Python via zipfile and zarr.open(zarr.storage.ZipStore(...)). Foreign zips written by python -m zipfile (stored or deflate) open cleanly in dafr.
  • Mirrors DataAxesFormats.jl mmap_zip_store.jl (~1070 LOC of Julia ported to ~1300 LOC of C++ + cpp11 + ALTREP).

ZarrDaf backend

  • New zarr_daf(uri, mode, name) backend reading and writing Zarr v2. Two store impls: DirStore (filesystem directory tree) and DictStore (in-memory). Zip-backed Zarr is also supported via the MmapZipStore backend (see above).
  • New files_to_zarr(src, dst) and zarr_to_files(src, dst) conversion helpers (same-filesystem only; correctness-first implementation re-encodes through the public API; hard-link optimization deferred as a perf follow-up).
  • open_daf("foo.daf.zarr") now returns a ZarrDaf.
  • Compression policy: dafr writes Zarr chunks uncompressed; reads uncompressed and gzip; rejects blosc/zstd/lz4 with a clear error pointing to re-save with compressor=None.
  • Sparse layouts mirror upstream DataAxesFormats.jl: 1-based on-disk indices for nzind / colptr / rowval; sparse-Bool all-TRUE skips nzval (storage compaction). Cross-language parity is verified via gated Python zarr.open() smoke tests.
  • Mirrors DataAxesFormats.jl v0.2.0 commits ea4b5f9 (Zarr v2 directory tree), 8cc3ff6 (in-memory store), 47e7693 (CRC fix — N/A for our in-memory layer), 79034fd (.zmetadata consolidation), 46d4ab2 (Files↔︎Zarr conversion).

reorder_axes() + open_daf() factory

  • New reorder_axes(daf, axis = perm, ...) permutes axis entries in place, rewriting every vector and matrix that depends on the axis. On files_daf the operation is crash-recoverable via a .reorder.backup/ directory of hardlinks; on the next files_daf(path, mode = "r+" | "w+") open, any in-progress reorder is automatically rolled back to the pre-reorder state.
  • New reset_reorder_axes(daf) to manually trigger recovery (mostly redundant given the auto-recovery on open).
  • New open_daf(uri, mode, name) factory function — dispatches on path / URL pattern. memory:// (or no path) → memory_daf, filesystem path → files_daf, *.daf.zarr / *.daf.zarr.zipzarr_daf, http(s)://http_daf. The factory replaces the previous filesystem-only open_daf from R/complete.R.
  • Mirrors DataAxesFormats.jl v0.2.0 commits 90301ff, 070bd34 (axis reordering) and b40377f (open_daf factory).

Internal: per-item cache_group refactor

The internal format API now returns per-item cache classifications, matching DataAxesFormats.jl v0.2.0 (upstream commit 49fbba1). No user-visible behavior change.

  • Every backend format_get_* method (scalar/axis_array/vector/matrix) returns list(value, cache_group) instead of a bare value.
  • Every backend format_set_* method returns the cache_group constant for the just-written value (or NULL) instead of invisible().
  • New exported character constants MEMORY_DATA, MAPPED_DATA, QUERY_DATA — accepted by empty_cache(daf, clear = ...) / keep = ... alongside the existing lowercase forms.
  • The reader-level cache (R/readers.R) now consults the backend-returned cache_group when storing fresh reads, instead of hardcoding the "memory" tier. mmap-eligible reads on files_daf now correctly land in the "mapped" tier.
  • Per-item classification: files_daf returns MEMORY_DATA for string/factor reads (R’s CHARSXP cache makes mmap moot for strings) and MAPPED_DATA for everything else. Matches upstream’s structural classification — no size thresholds.

This refactor is preparatory for the ZarrDaf and HttpDaf backends, which require per-item classification to drive their internal caching.

dafr 0.1.0 (development)

Query DSL: Julia-parity parser additions

Three DataAxesFormats.jl query-DSL features that were previously missing have been implemented, closing the last semantic gaps between dafr and the upstream Julia package:

  • >> Reduction reduces a vector or matrix to a scalar (e.g. @ gene : is_lateral >> Sum type Int64, or @ cell @ gene :: UMIs >> Sum). >> is no longer silently aliased to >|; on a grouped input (... / g >> Sum) it continues to produce a per-group vector, as before.
  • @ axis = entry picks one entry from a vector (@ cell : age @ cell = N89) or one cell from a matrix (@ cell @ gene :: UMIs @ cell = C @ gene = X). Two successive picks collapse a matrix to a scalar.
  • || value type T attaches a Julia-style dtype (Bool, Int8..Int64, UInt8..UInt64, Float32, Float64, String) to a scalar-lookup default, matching the existing behaviour of the same suffix inside reductions and element-wise ops.
  • IfMissing() builder gains an optional type kwarg (IfMissing(0, type = "Int64")).

dafr 0.1.0

First public release.

A native R + C++ implementation of the DataAxesFormats (DAF) data model for multi-dimensional data along arbitrary axes, ported from the Julia reference implementation with no Julia dependency.

Features

See the pkgdown site for full documentation.