Storage types
DataAxesFormats.StorageTypes
—
Module
Only a restricted set of scalar, matrix and vector types is stored by
Daf
.
The set of scalar types is restricted because we need to be able to store them in disk files. This rules out compound types such as
Dict
. This isn't an issue for vector and matrix elements but is sometimes bothersome for "scalar" data (not associated with any axis). If you find yourself needed to store such data, you'll have to serialize it to a string. By convention, we use
JSON
blobs for such data to maximize portability between different systems.
Julia supports a potentially infinite variety of ways to represent matrices and vectors.
Daf
is intentionally restricted to specific representations. This has several advantages:
-
Dafstorage formats need only implement storing these restricted representations, which lend themselves to simple storage in consecutive bytes (in memory and/or on disk). These representations also allow for memory-mapping the data from disk files, which allowsDafto deal with data sets larger than the available memory. However, we also allow storing vectors and matrices of strings. We try to make as efficient as possible (which isn't saying much). -
Client code need only worry about dealing with these restricted representations, which limits the amount of code paths required for efficient algorithm implementations. However, you (mostly) need not worry about this when invoking library functions, which have code paths covering all common matrix types. You do need to consider the layout of the data, though (see below).
This has the downside that
Daf
doesn't support efficient storage of specialized matrices (to pick a random example, upper triangular matrices). This isn't a great loss, since
Daf
targets storing arbitrary scientific data (especially biological data), which in general is not of any such special shape. The upside is that all matrices stored and returned by
Daf
have a clear layout (regardless of whether they are dense or sparse). This allows user code to ensure it is working "with the grain" of the data, which is
much
more efficient.
Currently all boolean vectors are matrices are stored internally using one byte per entry (that is, as
Vector{Bool}
and
Matrix{Bool}
rather than
BitVector
and
BitMatrix
. This is somewhat less efficient, but is simpler and Boolean data is rarely a significant part of either storage or processing.
DataAxesFormats.StorageTypes.StorageSigned
—
Type
StorageSigned = Union{Int8, Int16, Int32, Int64}
Signed integer number types that can be used as scalars, or elements in stored matrices or vectors.
DataAxesFormats.StorageTypes.StorageUnsigned
—
Type
StorageUnsigned = Union{UInt8, UInt16, UInt32, UInt64}
Unsigned integer number types that can be used as scalars, or elements in stored matrices or vectors.
DataAxesFormats.StorageTypes.StorageInteger
—
Type
StorageInteger = Union{StorageSigned, StorageUnsigned}
Integer number types that can be used as scalars, or elements in stored matrices or vectors.
DataAxesFormats.StorageTypes.StorageFloat
—
Type
StorageFloat = Union{Float32, Float64}
Floating point number types that can be used as scalars, or elements in stored matrices or vectors.
DataAxesFormats.StorageTypes.StorageReal
—
Type
StorageReal = Union{Bool, StorageInteger, StorageFloat}
Number types that can be used as scalars, or elements in stored matrices or vectors.
DataAxesFormats.StorageTypes.StorageScalar
—
Type
StorageScalar = Union{StorageReal, <:AbstractString}
Types that can be used as scalars, or elements in stored matrices or vectors.
This is restricted to
StorageReal
(including Booleans) and strings. It is arguably too restrictive, as in principle we could support any arbitrary
isbitstype
. However, in practice this would cause much trouble when accessing the data from other systems (specifically Python and R). Since
Daf
targets storing scientific data (especially biological data), as opposed to "anything at all", this restriction seems reasonable.
DataAxesFormats.StorageTypes.StorageScalarBase
—
Type
StorageScalarBase = Union{StorageReal, AbstractString}
For using in
where
clauses when a type needs to be a
StorageScalar
. That is, write
where {T <: StorageScalarBase}
instead of
where {T <: StorageScalar}
, because of the limitations of Julia's type system.
DataAxesFormats.StorageTypes.StorageVector
—
Type
StorageVector{T} = AbstractVector{T} where {T <: StorageScalar}
Vectors that can be directly stored (and fetched) from
Daf
storage.
The element type must be a
StorageScalar
, to allow storing the data in disk files. Vectors of strings are supported but will be less efficient.
DataAxesFormats.StorageTypes.StorageMatrix
—
Type
StorageMatrix{T} = AbstractMatrix{T} where {T <: StorageScalar}
Matrices that can be directly stored (and fetched) from
Daf
storage.
Index
-
DataAxesFormats.StorageTypes -
DataAxesFormats.StorageTypes.StorageFloat -
DataAxesFormats.StorageTypes.StorageInteger -
DataAxesFormats.StorageTypes.StorageMatrix -
DataAxesFormats.StorageTypes.StorageReal -
DataAxesFormats.StorageTypes.StorageScalar -
DataAxesFormats.StorageTypes.StorageScalarBase -
DataAxesFormats.StorageTypes.StorageSigned -
DataAxesFormats.StorageTypes.StorageUnsigned -
DataAxesFormats.StorageTypes.StorageVector