Daf

DataAxesFormats.DataAxesFormats Module

The DataAxesFormats package provides a uniform generic interface for accessing 1D and 2D data arranged along some set of axes. This is a much-needed generalization of the AnnData functionality. The key features are:

  • The data model StorageTypes include (1) some axes with named entries, (2) vector data indexed by a single axis, (3) matrix data indexed by a pair of axes, and also (4) scalar data (anything not tied to some axis).
  • Explicit control over 2D data (row or column major), with support for both dense and sparse matrices, both of which are crucial for performance.
  • Out of the box, allow storing the data in memory (using MemoryDaf ), directly inside HDF5 files (using H5df ), or as a collection of simple files in a directory (using FilesDaf ), which works nicely with tools like make for automating computation pipelines.
  • Import and export to/from AnnDataFormat for interoperability with non- Daf tools.
  • Implementation with a focus on memory-mapping to allow for efficient processing of large data sets (in theory, larger than the system's memory). In particular, merely opening a data set is a fast operation (almost) regardless of its size.
  • Well-defined interfaces for implementing additional storage Formats .
  • Creating Chains of data sets, allowing zero-copy reuse of common data between multiple computation pipelines.
  • Concat multiple data sets into a single data set along one or more axes.
  • A Query language for accessing the data, providing features such as slicing, aggregation and filtering, and making Views and Copies based on these queries.
  • Self documenting Computations with an explicit Contracts describing and enforcing the inputs and outputs, and Adapters for applying the computation to data of a different format.
Note

The top-level DataAxesFormats module re-exports all(most) everything from the sub-modules, so you can directly access any exported symbol by using DataAxesFormats (or, say, import DataAxesFormats: MemoryDaf ), instead of having to import or use qualified names (such as DataAxesFormats.MemoryFormat.MemoryDaf ).

The Daf datasets type hierarchy looks like this:

Here are all the internal modules implementing this package and the relationship between them (linking to their documentation). They are also listed in the quick access bar on the left.

Index