Mmap Zip Store
DataAxesFormats.MmapZipStores
—
Module
A memory-mapped, append-only Zarr storage backend implemented over a single ZIP archive.
This module provides
MmapZipStore
, a
Zarr.AbstractStore
subtype that can back a
ZarrDaf
(or, in principle, any Zarr array) with a ZIP file on the local filesystem. It serves two complementary use cases:
-
Reading any valid Zarr v2 ZIP archive (including archives produced by foreign tools such as Python's
zarrpackage), subject to Zarr.jl's existing support for data types, filters, and compressors. Stored (method0) entries are returned as zero-copy memory-mapped byte ranges; deflate-compressed (method8) and deflate64-compressed (method9) entries are decompressed on demand viaZipArchives.jl. Any other compression method raises a clearArgumentErrorfromZipArchives.jlon first access. In practice Zarr-ZIPs in the wild are overwhelmingly method0(since the chunks are already compressed internally) or method8. -
Creating and appending to a ZIP archive written by this package. Writes use stored (method
0) uncompressed entries exclusively, so chunk data can be memory-mapped for direct access. Entries may only be appended; existing entries cannot be modified or deleted.
Shared mmap
On open,
MmapZipStore
memory-maps the archive file once into a single
Vector{UInt8}
owned by the store. A read-only open uses an ordinary file-backed
mmap
covering exactly the current file size. A writable open uses a two-step mapping that keeps the virtual address of the archive stable across file growth: first,
max_file_size
bytes of virtual address space are reserved via an anonymous
PROT_NONE
mapping (which consumes VA only — zero RAM, zero disk, zero file bytes); then the file is overlaid onto the first
filesize
bytes of that reservation via
MAP_SHARED | MAP_FIXED
. Each append calls
ftruncate
to extend the real (non-sparse) file, followed by a re-overlay with
MAP_SHARED | MAP_FIXED
at the same base address to extend the accessible portion of the reservation to the new file size. Subsequent writes (local file header, data, central directory, end-of-central-directory, CRC32 patches) are pure stores into
store.file_mmap
, not
write()
syscalls. The only writes through the IO stream are the initial bootstrap of an empty archive and
ftruncate
calls. Each open therefore consumes a single reservation plus one file overlay regardless of entry count, and every stored (method-
0
) entry is served directly out of the shared mapping with no copy. The file on disk remains a normal, non-sparse file of exactly
filesize
bytes — copying the archive with ordinary tools does not inflate to
max_file_size
.
On-disk protocol
MmapZipStore
uses a two-step commit protocol that leaves the archive in a valid ZIP and valid Zarr state after every append, with no need to wait for a final close:
-
For each append, the file is extended via
ftruncateto its new end-of-archive position. The new central directory (containing both the pre-existing and the new entries) and its end-of-central-directory record are built in memory and copied into the mmap at the offset where the new local file header region will end. This is the commit point: after this copy, the archive on disk describes the new entry, and the local file header region lies in a sparse hole in the file (the bytes zero-initialized byftruncate). -
The new local file header is then copied at the offset that was previously occupied by the old central directory (and end-of-central-directory record), and the entry's stored data bytes are copied immediately after it. These copies may overlap what used to be the old central directory: that is safe, because step 1 already committed the superseding copy to a higher offset in the file.
If the process crashes between step 1 and step 2, the committed central directory claims an entry whose local file header is still partly (or entirely) missing or whose data's CRC32 does not match the recorded value. The next write-mode open detects this by validating the tail of the central directory from back to front; the first trailing run of invalid entries is rolled back by writing a new central directory and end-of-central-directory record at the oldest corrupt entry's local header offset, and
ftruncate
-ing the file to the new end-of-central-directory.
Two-phase append for
get_empty_*
The store exposes
reserve_mmap_zip_entry!
and
patch_mmap_zip_entry_crc!
to support Daf's two-phase
get_empty_*
/
filled_empty!
pattern without buffering gigabytes of zeros in memory.
reserve_mmap_zip_entry!
runs the full commit protocol with a CRC32 placeholder of
0
and returns a byte view over the data region in the shared mmap (a file hole until the user writes into it).
patch_mmap_zip_entry_crc!
then computes the real CRC32 from the now-filled data and patches the CRC32 field in both the local file header and the central directory via two four-byte stores into the shared mmap.
If the process crashes between
reserve_mmap_zip_entry!
and
patch_mmap_zip_entry_crc!
, the recovery pass on the next write-mode open discards the partial entries because their stored CRC32 placeholders of
0
do not match the actual data.
Aligned data offsets
Every local file header written by
MmapZipStore
is padded (via a second opaque ZIP extra field) so that the following data region starts at a
DAF_DATA_OFFSET_ALIGNMENT
-byte-aligned file offset. This lets readers wrap the data region as an
Array{T}
of the appropriate element type via
unsafe_wrap
with no copy.
try_mmap_entry_as
performs the alignment check at read time and returns
nothing
for unaligned foreign archives, in which case the caller should fall back to the ordinary decoded copy from
store[key]
.
Limitations
Cross-process writers to the same ZIP archive are
not
supported and will corrupt the archive. Concurrent access from multiple threads within the same process is not supported either — the store mutates its in-memory entry tables during appends without any internal locking, matching the thread-safety conventions of
Zarr.jl
's other built-in stores (
DirectoryStore
,
DictStore
,
ZipStore
). A higher-level writer lock (such as the one held by
ZarrDaf
) is assumed to serialize writes. Concurrent readers across threads, as long as no writer is active at the same time, are safe: the commit protocol leaves a valid on-disk archive at every commit point.
All archives produced on write are ZIP64 archives: every local file header and every central directory entry carries the ZIP64 extended information extra field, and the archive always ends with a ZIP64 end-of-central-directory record, a ZIP64 end-of-central-directory locator, and a legacy end-of-central-directory record (whose size/count fields are set to the ZIP64 sentinel values). This accommodates the multi-gigabyte chunks and many-thousand-entry archives that are routine for large Daf data sets, at the cost of ~28 bytes per entry in the central directory and a 98-byte trailing record region instead of the legacy 22-byte record. Modern ZIP readers (Info-ZIP, Python
zipfile
, 7-Zip,
ZipArchives.jl
, Java, .NET) all handle this transparently.
DataAxesFormats.MmapZipStores.MmapZipStore
—
Type
MmapZipStore(
path::AbstractString;
[writable::Bool = false,
create::Bool = false,
truncate::Bool = false,
max_file_size::Integer = 1 << 40]
)
Open (and optionally create or truncate) a ZIP archive at
path
as a Zarr store.
The
writable
,
create
, and
truncate
flags interact as follows (matching
ZarrDaf
's
r
/
r+
/
w+
/
w
modes):
writable
|
create
|
truncate
|
Behavior |
|---|---|---|---|
false
|
false
|
false
|
Read-only open of an existing archive (mode
r
)
|
true
|
false
|
false
|
Read/write open of an existing archive (mode
r+
)
|
true
|
true
|
false
|
Read/write open, creating an empty archive if missing (mode
w+
)
|
true
|
true
|
true
|
Discard any existing archive and create an empty one (mode
w
)
|
On a writable open, the store reserves
max_file_size
bytes of virtual address space via a single anonymous
PROT_NONE
mapping and overlays the file onto the first
filesize
bytes of that reservation (
MAP_SHARED | MAP_FIXED
). Each append calls
ftruncate
to grow the file by exactly the bytes needed (real, non-sparse) and re-overlays the file at the same base address to extend the accessible portion of the reservation. Reads slice into this single mapping, so the number of VMAs per open is small and fixed regardless of entry count. An append that would grow the file past
max_file_size
fails with an explicit error. Read-only opens memory-map exactly the current file size and ignore
max_file_size
.
On open, the existing central directory is parsed and cached in memory. On a write-mode open, an interrupted tail of the central directory (entries whose local file header or CRC32 does not validate) is detected and rolled back; see the module documentation for the full protocol.
DataAxesFormats.MmapZipStores.try_mmap_entry_as
—
Function
try_mmap_entry_as(
store::MmapZipStore,
key::AbstractString,
::Type{T},
dims::Union{Integer, Tuple{Vararg{Integer}}},
)::Union{Nothing, Array{T}} where {T}
If the entry named
key
exists in
store
, is held uncompressed (stored, method
0
), has exactly the byte size implied by
T
and
dims
, and its data region is suitably aligned for
T
, return a zero-copy
Array{T}
of shape
dims
viewing the mmap'd data region directly. Return
nothing
otherwise (absent, compressed, wrong size, or unaligned) and let the caller fall back to the ordinary decoded copy from
store[key]
.
For archives produced by
MmapZipStore
itself, the alignment precondition always holds: every local file header is padded so the data region starts at an
DAF_DATA_OFFSET_ALIGNMENT
-byte-aligned file offset, which matches the alignment required by every Daf element type. Foreign archives may produce misaligned data offsets, in which case this returns
nothing
.
The returned array aliases
store.file_mmap
and remains valid as long as
store
is open.
DataAxesFormats.MmapZipStores.reserve_mmap_zip_entry!
—
Function
reserve_mmap_zip_entry!(
store::MmapZipStore,
key::AbstractString,
data_size::Integer,
)::AbstractVector{UInt8}
Reserve space for a new entry of
data_size
bytes with a placeholder CRC32 of
0
, and return an mmap-backed byte view over the reserved data region. The caller fills the returned buffer in place and then must call
patch_mmap_zip_entry_crc!
before any further appends.
If the caller crashes between the reserve and patch steps, the next write-mode open will detect the placeholder CRC mismatch and roll the reservation back.
DataAxesFormats.MmapZipStores.patch_mmap_zip_entry_crc!
—
Function
patch_mmap_zip_entry_crc!(store::MmapZipStore, key::AbstractString)::Nothing
Compute the real CRC32 of the data region of the entry previously reserved via
reserve_mmap_zip_entry!
and patch the CRC32 field in both the local file header and the central directory record. Each patch is a single four-byte store into the shared mmap.