Daf

DataAxesFormats.DataAxesFormats Module

The DataAxesFormats package provides a uniform generic interface for accessing 1D and 2D data arranged along some set of axes. This is a much-needed generalization of the AnnData functionality. The key features are:

  • The data model StorageTypes include (1) some axes with named entries, (2) vector data indexed by a single axis, (3) matrix data indexed by a pair of axes, and also (4) scalar data (anything not tied to some axis).
  • Explicit control over 2D data (row or column major), with support for both dense and sparse matrices, both of which are crucial for performance.
  • Out of the box, allow storing the data in memory (using MemoryDaf ), directly inside HDF5 files (using H5df ), or as a collection of simple files in a directory (using FilesDaf ), which works nicely with tools like make for automating computation pipelines.
  • Import and export to/from AnnDataFormat for interoperability with non- Daf tools.
  • Implementation with a focus on memory-mapping to allow for efficient processing of large data sets (in theory, larger than the system's memory). In particular, merely opening a data set is a fast operation (almost) regardless of its size.
  • Well-defined interfaces for implementing additional storage Formats .
  • Creating Chains of data sets, allowing zero-copy reuse of common data between multiple computation pipelines.
  • Concat multiple data sets into a single data set along one or more axes.
  • A Query language for accessing the data, providing features such as slicing, aggregation and filtering, and making Views and Copies based on these queries.
  • Self documenting Computations with an explicit Contracts describing and enforcing the inputs and outputs, and Adapters for applying the computation to data of a different format.
Note

The top-level DataAxesFormats module re-exports all(most) everything from the sub-modules, so you can directly access any exported symbol by using DataAxesFormats (or, say, import DataAxesFormats: MemoryDaf ), instead of having to import or use qualified names (such as DataAxesFormats.MemoryFormat.MemoryDaf ).

The Daf datasets type hierarchy looks like this:

Here are all the internal modules implementing this package and the relationship between them (linking to their documentation). They are also listed in the quick access bar on the left.

Copies Copies Adapters Adapters Copies->Adapters Concat Concat Copies->Concat Writers Writers Writers->Copies MemoryFormat MemoryFormat Writers->MemoryFormat Chains Chains Writers->Chains H5dfFormat H5dfFormat Writers->H5dfFormat FilesFormat FilesFormat Writers->FilesFormat Reconstruction Reconstruction Writers->Reconstruction MemoryFormat->Adapters ExampleData ExampleData MemoryFormat->ExampleData AnnDataFormat AnnDataFormat MemoryFormat->AnnDataFormat Chains->Adapters CompleteDaf CompleteDaf Chains->CompleteDaf Chains->ExampleData Computations Computations Computations->Adapters Views Views Views->Chains Contracts Contracts Views->Contracts Views->Concat ReadOnly ReadOnly ReadOnly->Views ReadOnly->H5dfFormat ReadOnly->FilesFormat Queries Queries Queries->Views Queries->Reconstruction Operations Operations Operations->Queries Operations->FilesFormat Tokens Tokens Tokens->Operations Formats Formats Tokens->Formats Registry Registry Registry->Operations Registry->Formats Readers Readers Readers->Writers Readers->ReadOnly Readers->Queries Groups Groups Readers->Groups H5dfFormat->CompleteDaf FilesFormat->CompleteDaf FilesFormat->ExampleData Contracts->Computations Formats->Readers Keys Keys Keys->Formats StorageTypes StorageTypes StorageTypes->Registry

Index