Daf

DataAxesFormats.DataAxesFormats Module

The DataAxesFormats package provides a uniform generic interface for accessing 1D and 2D data arranged along some set of axes. This is a much-needed generalization of the AnnData functionality. The key features are:

  • The data model StorageTypes include (1) some axes with named entries, (2) vector data indexed by a single axis, (3) matrix data indexed by a pair of axes, and also (4) scalar data (anything not tied to some axis).
  • Explicit control over 2D data (row or column major), with support for both dense and sparse matrices, both of which are crucial for performance.
  • Out of the box, allow storing the data in memory (using MemoryDaf ), directly inside HDF5 files (using H5df ), or as a collection of simple files in a directory (using FilesDaf ), which works nicely with tools like make for automating computation pipelines.
  • Import and export to/from AnnDataFormat for interoperability with non- Daf tools.
  • Implementation with a focus on memory-mapping to allow for efficient processing of large data sets (in theory, larger than the system's memory). In particular, merely opening a data set is a fast operation (almost) regardless of its size.
  • Well-defined interfaces for implementing additional storage Formats .
  • Creating Chains of data sets, allowing zero-copy reuse of common data between multiple computation pipelines.
  • Concat multiple data sets into a single data set along one or more axes.
  • A Query language for accessing the data, providing features such as slicing, aggregation and filtering, and making Views and Copies based on these queries.
  • Self documenting Computations with an explicit Contracts describing and enforcing the inputs and outputs, and Adapters for applying the computation to data of a different format.
Note

The top-level DataAxesFormats module re-exports all(most) everything from the sub-modules, so you can directly access any exported symbol by using DataAxesFormats (or, say, import DataAxesFormats: MemoryDaf ), instead of having to import or use qualified names (such as DataAxesFormats.MemoryFormat.MemoryDaf ).

The Daf datasets type hierarchy looks like this:

Here are all the internal modules implementing this package and the relationship between them (linking to their documentation). They are also listed in the quick access bar on the left.

%3 H5dfFormat H5dfFormat CompleteDaf CompleteDaf H5dfFormat->CompleteDaf Writers Writers Writers->H5dfFormat MemoryFormat MemoryFormat Writers->MemoryFormat Copies Copies Writers->Copies Chains Chains Writers->Chains FilesFormat FilesFormat Writers->FilesFormat Reconstruction Reconstruction Writers->Reconstruction ReadOnly ReadOnly ReadOnly->H5dfFormat Views Views ReadOnly->Views ReadOnly->FilesFormat Readers Readers Readers->Writers Readers->ReadOnly Queries Queries Readers->Queries Groups Groups Readers->Groups ExampleData ExampleData MemoryFormat->ExampleData AnnDataFormat AnnDataFormat MemoryFormat->AnnDataFormat Adapters Adapters MemoryFormat->Adapters Concat Concat Copies->Concat Copies->Adapters Views->Concat Views->Chains Contracts Contracts Views->Contracts Formats Formats Formats->Readers Chains->ExampleData Chains->CompleteDaf Chains->Adapters FilesFormat->ExampleData FilesFormat->CompleteDaf Computations Computations Computations->Adapters Contracts->Computations Queries->Views Queries->Reconstruction Keys Keys Keys->Formats Operations Operations Operations->FilesFormat Operations->Queries Tokens Tokens Tokens->Formats Tokens->Operations Registry Registry Registry->Formats Registry->Operations StorageTypes StorageTypes StorageTypes->Registry

Index