Mmap Zip Store

DataAxesFormats.MmapZipStores Module

A memory-mapped, append-only Zarr storage backend implemented over a single ZIP archive.

This module provides MmapZipStore , a Zarr.AbstractStore subtype that can back a ZarrDaf (or, in principle, any Zarr array) with a ZIP file on the local filesystem. It serves two complementary use cases:

  • Reading any valid Zarr v2 ZIP archive (including archives produced by foreign tools such as Python's zarr package), subject to Zarr.jl's existing support for data types, filters, and compressors. Stored (method 0 ) entries are returned as zero-copy memory-mapped byte ranges; deflate-compressed (method 8 ) and deflate64-compressed (method 9 ) entries are decompressed on demand via ZipArchives.jl . Any other compression method raises a clear ArgumentError from ZipArchives.jl on first access. In practice Zarr-ZIPs in the wild are overwhelmingly method 0 (since the chunks are already compressed internally) or method 8 .

  • Creating and appending to a ZIP archive written by this package. Writes use stored (method 0 ) uncompressed entries exclusively, so chunk data can be memory-mapped for direct access. Entries may only be appended; existing entries cannot be modified or deleted.

Shared mmap

On open, MmapZipStore memory-maps the archive file once into a single Vector{UInt8} owned by the store. A read-only open uses an ordinary file-backed mmap covering exactly the current file size. A writable open uses a two-step mapping that keeps the virtual address of the archive stable across file growth: first, max_file_size bytes of virtual address space are reserved via an anonymous PROT_NONE mapping (which consumes VA only — zero RAM, zero disk, zero file bytes); then the file is overlaid onto the first filesize bytes of that reservation via MAP_SHARED | MAP_FIXED . Each append calls ftruncate to extend the real (non-sparse) file, followed by a re-overlay with MAP_SHARED | MAP_FIXED at the same base address to extend the accessible portion of the reservation to the new file size. Subsequent writes (local file header, data, central directory, end-of-central-directory, CRC32 patches) are pure stores into store.file_mmap , not write() syscalls. The only writes through the IO stream are the initial bootstrap of an empty archive and ftruncate calls. Each open therefore consumes a single reservation plus one file overlay regardless of entry count, and every stored (method- 0 ) entry is served directly out of the shared mapping with no copy. The file on disk remains a normal, non-sparse file of exactly filesize bytes — copying the archive with ordinary tools does not inflate to max_file_size .

On-disk protocol

MmapZipStore uses a two-step commit protocol that leaves the archive in a valid ZIP and valid Zarr state after every append, with no need to wait for a final close:

  1. For each append, the file is extended via ftruncate to its new end-of-archive position. The new central directory (containing both the pre-existing and the new entries) and its end-of-central-directory record are built in memory and copied into the mmap at the offset where the new local file header region will end. This is the commit point: after this copy, the archive on disk describes the new entry, and the local file header region lies in a sparse hole in the file (the bytes zero-initialized by ftruncate ).

  2. The new local file header is then copied at the offset that was previously occupied by the old central directory (and end-of-central-directory record), and the entry's stored data bytes are copied immediately after it. These copies may overlap what used to be the old central directory: that is safe, because step 1 already committed the superseding copy to a higher offset in the file.

If the process crashes between step 1 and step 2, the committed central directory claims an entry whose local file header is still partly (or entirely) missing or whose data's CRC32 does not match the recorded value. The next write-mode open detects this by validating the tail of the central directory from back to front; the first trailing run of invalid entries is rolled back by writing a new central directory and end-of-central-directory record at the oldest corrupt entry's local header offset, and ftruncate -ing the file to the new end-of-central-directory.

Two-phase append for get_empty_*

The store exposes reserve_mmap_zip_entry! and patch_mmap_zip_entry_crc! to support Daf's two-phase get_empty_* / filled_empty! pattern without buffering gigabytes of zeros in memory. reserve_mmap_zip_entry! runs the full commit protocol with a CRC32 placeholder of 0 and returns a byte view over the data region in the shared mmap (a file hole until the user writes into it). patch_mmap_zip_entry_crc! then computes the real CRC32 from the now-filled data and patches the CRC32 field in both the local file header and the central directory via two four-byte stores into the shared mmap.

If the process crashes between reserve_mmap_zip_entry! and patch_mmap_zip_entry_crc! , the recovery pass on the next write-mode open discards the partial entries because their stored CRC32 placeholders of 0 do not match the actual data.

Aligned data offsets

Every local file header written by MmapZipStore is padded (via a second opaque ZIP extra field) so that the following data region starts at a DAF_DATA_OFFSET_ALIGNMENT -byte-aligned file offset. This lets readers wrap the data region as an Array{T} of the appropriate element type via unsafe_wrap with no copy. try_mmap_entry_as performs the alignment check at read time and returns nothing for unaligned foreign archives, in which case the caller should fall back to the ordinary decoded copy from store[key] .

Limitations

Cross-process writers to the same ZIP archive are not supported and will corrupt the archive. Concurrent access from multiple threads within the same process is not supported either — the store mutates its in-memory entry tables during appends without any internal locking, matching the thread-safety conventions of Zarr.jl 's other built-in stores ( DirectoryStore , DictStore , ZipStore ). A higher-level writer lock (such as the one held by ZarrDaf ) is assumed to serialize writes. Concurrent readers across threads, as long as no writer is active at the same time, are safe: the commit protocol leaves a valid on-disk archive at every commit point.

All archives produced on write are ZIP64 archives: every local file header and every central directory entry carries the ZIP64 extended information extra field, and the archive always ends with a ZIP64 end-of-central-directory record, a ZIP64 end-of-central-directory locator, and a legacy end-of-central-directory record (whose size/count fields are set to the ZIP64 sentinel values). This accommodates the multi-gigabyte chunks and many-thousand-entry archives that are routine for large Daf data sets, at the cost of ~28 bytes per entry in the central directory and a 98-byte trailing record region instead of the legacy 22-byte record. Modern ZIP readers (Info-ZIP, Python zipfile , 7-Zip, ZipArchives.jl , Java, .NET) all handle this transparently.

DataAxesFormats.MmapZipStores.MmapZipStore Type
MmapZipStore(
    path::AbstractString;
    [writable::Bool = false,
    create::Bool = false,
    truncate::Bool = false,
    max_file_size::Integer = 1 << 40]
)

Open (and optionally create or truncate) a ZIP archive at path as a Zarr store.

The writable , create , and truncate flags interact as follows (matching ZarrDaf 's r / r+ / w+ / w modes):

writable create truncate Behavior
false false false Read-only open of an existing archive (mode r )
true false false Read/write open of an existing archive (mode r+ )
true true false Read/write open, creating an empty archive if missing (mode w+ )
true true true Discard any existing archive and create an empty one (mode w )

On a writable open, the store reserves max_file_size bytes of virtual address space via a single anonymous PROT_NONE mapping and overlays the file onto the first filesize bytes of that reservation ( MAP_SHARED | MAP_FIXED ). Each append calls ftruncate to grow the file by exactly the bytes needed (real, non-sparse) and re-overlays the file at the same base address to extend the accessible portion of the reservation. Reads slice into this single mapping, so the number of VMAs per open is small and fixed regardless of entry count. An append that would grow the file past max_file_size fails with an explicit error. Read-only opens memory-map exactly the current file size and ignore max_file_size .

On open, the existing central directory is parsed and cached in memory. On a write-mode open, an interrupted tail of the central directory (entries whose local file header or CRC32 does not validate) is detected and rolled back; see the module documentation for the full protocol.

DataAxesFormats.MmapZipStores.try_mmap_entry_as Function
try_mmap_entry_as(
    store::MmapZipStore,
    key::AbstractString,
    ::Type{T},
    dims::Union{Integer, Tuple{Vararg{Integer}}},
)::Union{Nothing, Array{T}} where {T}

If the entry named key exists in store , is held uncompressed (stored, method 0 ), has exactly the byte size implied by T and dims , and its data region is suitably aligned for T , return a zero-copy Array{T} of shape dims viewing the mmap'd data region directly. Return nothing otherwise (absent, compressed, wrong size, or unaligned) and let the caller fall back to the ordinary decoded copy from store[key] .

For archives produced by MmapZipStore itself, the alignment precondition always holds: every local file header is padded so the data region starts at an DAF_DATA_OFFSET_ALIGNMENT -byte-aligned file offset, which matches the alignment required by every Daf element type. Foreign archives may produce misaligned data offsets, in which case this returns nothing .

The returned array aliases store.file_mmap and remains valid as long as store is open.

DataAxesFormats.MmapZipStores.reserve_mmap_zip_entry! Function
reserve_mmap_zip_entry!(
    store::MmapZipStore,
    key::AbstractString,
    data_size::Integer,
)::AbstractVector{UInt8}

Reserve space for a new entry of data_size bytes with a placeholder CRC32 of 0 , and return an mmap-backed byte view over the reserved data region. The caller fills the returned buffer in place and then must call patch_mmap_zip_entry_crc! before any further appends.

If the caller crashes between the reserve and patch steps, the next write-mode open will detect the placeholder CRC mismatch and roll the reservation back.

Index