Formats
DataAxesFormats.Formats
—
Module
The
FormatReader
and
FormatWriter
interfaces specify a low-level API for storing
Daf
data. To extend
Daf
to support an additional format, create a new implementation of this API.
A storage format object contains some named scalar data, a set of axes (each with a unique name for each entry), and named vector and matrix data based on these axes.
Data properties are identified by a unique name given the axes they are based on. That is, there is a separate namespace for scalar properties, vector properties for each specific axis, and matrix properties for each (ordered) pair of axes.
For matrices, we keep careful track of their layout(@ref). Specifically, a storage format only deals with column-major matrices, listed under the rows axis first and the columns axis second. A storage format object may hold two copies of the same matrix, in both possible memory layouts, in which case it will be listed twice, under both axes orders.
In general, storage format objects are as "dumb" as possible, to make it easier to support new storage formats. The required functions implement a glorified key-value repository, with the absolutely minimal necessary logic to deal with the separate property namespaces listed above.
For clarity of documentation, we split the type hierarchy to
DafWriter
<:
FormatWriter
<:
DafReader
<:
FormatReader
.
The functions listed here use the
FormatReader
for read-only operations and
FormatWriter
for write operations into a
Daf
storage. This is a low-level API, not meant to be used from outside the package, and therefore is not re-exported from the top-level
DataAxesFormats
namespace.
In contrast, the functions using
DafReader
and
DafWriter
describe the high-level API meant to be used from outside the package, and are re-exported. These functions are listed in the
DataAxesFormats.Readers
and
DataAxesFormats.Writers
modules. They provide all the logic common to any storage format, allowing us to keep the format-specific functions as simple as possible.
That is, when implementing a new
Daf
storage format, you should write
struct MyFormat <: DafWriter
, and implement the functions listed here for both
FormatReader
and
FormatWriter
.
Read API
DataAxesFormats.Formats.DafReader
—
Type
A high-level abstract interface for read-only access to
Daf
data.
All the functions for this type are provided based on the functions required for
FormatReader
. See the
Readers
module for their description.
DataAxesFormats.Formats.Internal
—
Type
struct Internal ... end
Internal data we need to keep in any concrete
FormatReader
. This has to be available as a
.internal
data member of the concrete format. This enables all the high-level
DafReader
and
DafWriter
functions.
The constructor will automatically call
unique_name
to try and make the names unique for improved error messages.
Caching
DataAxesFormats.Formats.CacheGroup
—
Type
Types of cached data inside
Daf
.
-
MappedData- memory-mapped disk data. This is the cheapest data, as it doesn't put pressure on the garbage collector. It requires some OS resources to maintain the mapping, and physical memory for the subset of the data that is actually being accessed. That is, one can memory map larger data than the physical memory, and performance will be good, as long as the subset of the data that is actually accessed is small enough to fit in memory. If it isn't, the performance will drop (a lot!) because the OS will be continuously reading data pages from disk - but it will not crash due to an out of memory error. It is very important not to re-map the same data twice because that causes all sort of inefficiencies and edge cases in the hardware and low-level software. -
MemoryData- a copy of data (from disk, or computed). This does pressure the garbage collector and can cause out of memory errors. However, recomputing or re-fetching the data from disk is slow, so caching this data is crucial for performance. -
QueryData- data that is computed by queries based on stored data (e.g., masked data, or results of a reduction or an element-wise operation). This again takes up application memory and may cause out of memory errors, but it is very useful to cache the results when the same query is executed multiple times (e.g., when using views). Manually executing queries therefore allows to explicitly disable the caching of the query results, since some queries will not be repeated.
If too much data has been cached, call
empty_cache!
to release it.
DataAxesFormats.Formats.empty_cache!
—
Function
empty_cache!(
daf::DafReader;
[clear::Maybe{CacheGroup} = nothing,
keep::Maybe{CacheGroup} = nothing]
)::Nothing
Clear some cached data. By default, completely empties the caches. You can specify either
clear
, to only forget a specific
CacheGroup
(e.g., for clearing only
QueryData
), or
keep
, to forget everything except a specific
CacheGroup
(e.g., for keeping only
MappedData
). You can't specify both
clear
and
keep
.
Description
DataAxesFormats.Formats.format_description_header
—
Function
format_description_header(format::FormatReader, lines::Vector{String}, deep::Bool)::Nothing
Allow a
format
to amit additional description header lines.
This trusts that we have a read lock on the data set.
DataAxesFormats.Formats.format_description_footer
—
Function
format_description_footer(format::FormatReader, lines::Vector{String}; cache::Bool, deep::Bool, tensors::Bool)::Nothing
Allow a
format
to amit additional description footer lines. If
deep
, this also emit the description of any data sets nested in this one, if any.
This trusts that we have a read lock on the data set.
Scalar properties
DataAxesFormats.Formats.format_has_scalar
—
Function
format_has_scalar(format::FormatReader, name::AbstractString)::Bool
Check whether a scalar property with some
name
exists in
format
.
This trusts that we have a read lock on the data set.
DataAxesFormats.Formats.format_scalars_set
—
Function
format_scalars_set(format::FormatReader)::AbstractSet{<:AbstractString}
The names of the scalar properties in
format
.
This trusts that we have a read lock on the data set.
DataAxesFormats.Formats.format_get_scalar
—
Function
format_get_scalar(format::FormatReader, name::AbstractString)::StorageScalar
Implement fetching the value of a scalar property with some
name
in
format
.
This trusts that we have a read lock on the data set, and that the
name
scalar property exists in
format
.
Data axes
DataAxesFormats.Formats.format_has_axis
—
Function
format_has_axis(format::FormatReader, axis::AbstractString; for_change::Bool)::Bool
Check whether some
axis
exists in
format
. If
for_change
, this is done just prior to adding or deleting the axis.
This trusts that we have a read lock on the data set.
DataAxesFormats.Formats.format_axes_set
—
Function
format_axes_set(format::FormatReader)::AbstractSet{<:AbstractString}
The names of the axes of
format
.
This trusts that we have a read lock on the data set.
DataAxesFormats.Formats.format_axis_vector
—
Function
format_axis_vector(format::FormatReader, axis::AbstractString)::AbstractVector{<:AbstractString}
Implement fetching the unique names of the entries of some
axis
of
format
.
This trusts that we have a read lock on the data set, and that the
axis
exists in
format
.
DataAxesFormats.Formats.format_axis_length
—
Function
format_axis_length(format::FormatReader, axis::AbstractString)::Int64
Implement fetching the number of entries along the
axis
.
This trusts that we have a read lock on the data set, and that the
axis
exists in
format
.
Vector properties
DataAxesFormats.Formats.format_has_vector
—
Function
format_has_vector(format::FormatReader, axis::AbstractString, name::AbstractString)::Bool
Implement checking whether a vector property with some
name
exists for the
axis
in
format
.
This trusts that we have a read lock on the data set, that the
axis
exists in
format
and that the property name isn't
name
.
DataAxesFormats.Formats.format_vectors_set
—
Function
format_vectors_set(format::FormatReader, axis::AbstractString)::AbstractSet{<:AbstractString}
Implement fetching the names of the vectors for the
axis
in
format
,
not
including the special
name
property.
This trusts that we have a read lock on the data set, and that the
axis
exists in
format
.
DataAxesFormats.Formats.format_get_vector
—
Function
format_get_vector(format::FormatReader, axis::AbstractString, name::AbstractString)::StorageVector
Implement fetching the vector property with some
name
for some
axis
in
format
.
This trusts that we have a read lock on the data set, that the
axis
exists in
format
, and the
name
vector property exists for the
axis
.
Matrix properties
DataAxesFormats.Formats.format_has_matrix
—
Function
format_has_matrix(
format::FormatReader,
rows_axis::AbstractString,
columns_axis::AbstractString,
name::AbstractString;
)::Bool
Implement checking whether a matrix property with some
name
exists for the
rows_axis
and the
columns_axis
in
format
. If
cache
also checks whether the matrix exists in the cache.
This trusts that we have a read lock on the data set, and that the
rows_axis
and the
columns_axis
exist in
format
.
DataAxesFormats.Formats.format_matrices_set
—
Function
format_matrices_set(
format::FormatReader,
rows_axis::AbstractString,
columns_axis::AbstractString,
)::AbstractSet{<:AbstractString}
Implement fetching the names of the matrix properties for the
rows_axis
and
columns_axis
in
format
.
This trusts that we have a read lock on the data set, and that the
rows_axis
and
columns_axis
exist in
format
.
DataAxesFormats.Formats.format_get_matrix
—
Function
format_get_matrix(
format::FormatReader,
rows_axis::AbstractString,
columns_axis::AbstractString,
name::AbstractString
)::StorageMatrix
Implement fetching the matrix property with some
name
for some
rows_axis
and
columns_axis
in
format
.
This trusts that we have a read lock on the data set, and that the
rows_axis
and
columns_axis
exist in
format
, and the
name
matrix property exists for them.
Write API
DataAxesFormats.Formats.DafWriter
—
Type
A high-level abstract interface for write access to
Daf
data.
All the functions for this type are provided based on the functions required for
FormatWriter
. See the
Writers
module for their description.
DataAxesFormats.Formats.FormatWriter
—
Type
An abstract interface for writing into
Daf
storage formats.
Each storage format must implement the functions listed below for writing into the storage.
Scalar properties
DataAxesFormats.Formats.format_set_scalar!
—
Function
format_set_scalar!(
format::FormatWriter,
name::AbstractString,
value::StorageScalar,
)::Nothing
Implement setting the
value
of a scalar property with some
name
in
format
.
This trusts that we have a write lock on the data set, and that the
name
scalar property does not exist in
format
.
DataAxesFormats.Formats.format_delete_scalar!
—
Function
format_delete_scalar!(
format::FormatWriter,
name::AbstractString;
for_set::Bool
)::Nothing
Implement deleting a scalar property with some
name
from
format
. If
for_set
, this is done just prior to setting the scalar with a different value.
This trusts that we have a write lock on the data set, and that the
name
scalar property exists in
format
.
Data axes
DataAxesFormats.Formats.format_add_axis!
—
Function
format_add_axis!(
format::FormatWriter,
axis::AbstractString,
entries::AbstractVector{<:AbstractString}
)::Nothing
Implement adding a new
axis
to
format
.
This trusts we have a write lock on the data set, that the
axis
does not already exist in
format
, and that the names of the
entries
are unique.
DataAxesFormats.Formats.format_delete_axis!
—
Function
format_delete_axis!(format::FormatWriter, axis::AbstractString)::Nothing
Implement deleting some
axis
from
format
.
This trusts This trusts we have a write lock on the data set, that the
axis
exists in
format
, and that all properties that are based on this axis have already been deleted.
Vector properties
DataAxesFormats.Formats.format_set_vector!
—
Function
format_set_vector!(
format::FormatWriter,
axis::AbstractString,
name::AbstractString,
vector::Union{StorageScalar, StorageVector},
)::Nothing
Implement setting a vector property with some
name
for some
axis
in
format
.
If the
vector
specified is actually a
StorageScalar
, the stored vector is filled with this value.
This trusts we have a write lock on the data set, that the
axis
exists in
format
, that the vector property
name
isn't
"name"
, that it does not exist for the
axis
, and that the
vector
has the appropriate length for it.
DataAxesFormats.Formats.format_delete_vector!
—
Function
format_delete_vector!(
format::FormatWriter,
axis::AbstractString,
name::AbstractString;
for_set::Bool
)::Nothing
Implement deleting a vector property with some
name
for some
axis
from
format
. If
for_set
, this is done just prior to setting the vector with a different value.
This trusts we have a write lock on the data set, that the
axis
exists in
format
, that the vector property name isn't
name
, and that the
name
vector exists for the
axis
.
Matrix properties
DataAxesFormats.Formats.format_set_matrix!
—
Function
format_set_matrix!(
format::FormatWriter,
rows_axis::AbstractString,
columns_axis::AbstractString,
name::AbstractString,
matrix::StorageMatrix,
)::Nothing
Implement setting the matrix property with some
name
for some
rows_axis
and
columns_axis
in
format
.
If the
matrix
specified is actually a
StorageScalar
, the stored matrix is filled with this value.
This trusts we have a write lock on the data set, that the
rows_axis
and
columns_axis
exist in
format
, that the
name
matrix property does not exist for them, and that the
matrix
is column-major of the appropriate size for it.
DataAxesFormats.Formats.format_relayout_matrix!
—
Function
format_relayout_matrix!(
format::FormatWriter,
rows_axis::AbstractString,
columns_axis::AbstractString,
name::AbstractString,
matrix::StorageMatrix,
)::StorageMatrix
relayout!
the existing
name
column-major
matrix
property for the
rows_axis
and the
columns_axis
and store the results as a row-major matrix property (that is, with flipped axes).
This trusts we have a write lock on the data set, that the
rows_axis
and
columns_axis
are different from each other, exist in
format
, that the
name
matrix property exists for them, and that it does not exist for the flipped axes.
DataAxesFormats.Formats.format_delete_matrix!
—
Function
format_delete_matrix!(
format::FormatWriter,
rows_axis::AbstractString,
columns_axis::AbstractString,
name::AbstractString;
for_set::Bool
)::StorageMatrix
Implement deleting a matrix property with some
name
for some
rows_axis
and
columns_axis
from
format
. If
for_set
, this is done just prior to setting the matrix with a different value.
This trusts we have a write lock on the data set, that the
rows_axis
and
columns_axis
exist in
format
, and that the
name
matrix property exists for them.
Creating properties
DataAxesFormats.Formats.format_get_empty_dense_vector!
—
Function
format_get_empty_dense_vector!(
format::FormatWriter,
axis::AbstractString,
name::AbstractString,
eltype::Type{T},
)::Vector{T} where {T <: StorageReal}
Implement setting a vector property with some
name
for some
axis
in
format
.
Implement creating an empty dense
matrix
with some
name
for some
rows_axis
and
columns_axis
in
format
.
This trusts we have a write lock on the data set, that the
axis
exists in
format
and that the vector property
name
isn't
"name"
, and that it does not exist for the
axis
.
DataAxesFormats.Formats.format_get_empty_sparse_vector!
—
Function
format_get_empty_sparse_vector!(
format::FormatWriter,
axis::AbstractString,
name::AbstractString,
eltype::Type{T},
nnz::StorageInteger,
indtype::Type{I},
)::Tuple{AbstractVector{I}, AbstractVector{T}, Any}
where {T <: StorageReal, I <: StorageInteger}
Implement creating an empty dense vector property with some
name
for some
rows_axis
and
columns_axis
in
format
.
This trusts we have a write lock on the data set, that the
axis
exists in
format
and that the vector property
name
isn't
"name"
, and that it does not exist for the
axis
.
DataAxesFormats.Formats.format_filled_empty_sparse_vector!
—
Function
format_filled_empty_sparse_vector!(
format::FormatWriter,
axis::AbstractString,
name::AbstractString,
filled::SparseVector{<:StorageReal, <:StorageInteger},
)::Nothing
Allow the
format
to perform caching once the empty sparse vector has been
filled
. By default this does nothing.
DataAxesFormats.Formats.format_get_empty_dense_matrix!
—
Function
format_get_empty_dense_matrix!(
format::FormatWriter,
rows_axis::AbstractString,
columns_axis::AbstractString,
name::AbstractString,
eltype::Type{T},
)::AbstractMatrix{T} where {T <: StorageReal}
Implement creating an empty dense matrix property with some
name
for some
rows_axis
and
columns_axis
in
format
.
This trusts we have a write lock on the data set, that the
rows_axis
and
columns_axis
exist in
format
and that the
name
matrix property does not exist for them.
DataAxesFormats.Formats.format_get_empty_sparse_matrix!
—
Function
format_get_empty_sparse_matrix!(
format::FormatWriter,
rows_axis::AbstractString,
columns_axis::AbstractString,
name::AbstractString,
eltype::Type{T},
intdype::Type{I},
nnz::StorageInteger,
)::Tuple{AbstractVector{I}, AbstractVector{I}, AbstractVector{T}, Any}
where {T <: StorageReal, I <: StorageInteger}
Implement creating an empty sparse matrix property with some
name
for some
rows_axis
and
columns_axis
in
format
.
This trusts we have a write lock on the data set, that the
rows_axis
and
columns_axis
exist in
format
and that the
name
matrix property does not exist for them.
DataAxesFormats.Formats.format_filled_empty_sparse_matrix!
—
Function
format_filled_empty_sparse_matrix!(
format::FormatWriter,
rows_axis::AbstractString,
columns_axis::AbstractString,
name::AbstractString,
filled::SparseMatrixCSC{<:StorageReal, <:StorageInteger},
)::Nothing
Allow the
format
to perform caching once the empty sparse matrix has been
filled
. By default this does nothing.
Index
-
DataAxesFormats.Formats -
DataAxesFormats.Formats.CacheGroup -
DataAxesFormats.Formats.DafReader -
DataAxesFormats.Formats.DafWriter -
DataAxesFormats.Formats.FormatReader -
DataAxesFormats.Formats.FormatWriter -
DataAxesFormats.Formats.Internal -
DataAxesFormats.Formats.empty_cache! -
DataAxesFormats.Formats.format_add_axis! -
DataAxesFormats.Formats.format_axes_set -
DataAxesFormats.Formats.format_axis_length -
DataAxesFormats.Formats.format_axis_vector -
DataAxesFormats.Formats.format_delete_axis! -
DataAxesFormats.Formats.format_delete_matrix! -
DataAxesFormats.Formats.format_delete_scalar! -
DataAxesFormats.Formats.format_delete_vector! -
DataAxesFormats.Formats.format_description_footer -
DataAxesFormats.Formats.format_description_header -
DataAxesFormats.Formats.format_filled_empty_sparse_matrix! -
DataAxesFormats.Formats.format_filled_empty_sparse_vector! -
DataAxesFormats.Formats.format_get_empty_dense_matrix! -
DataAxesFormats.Formats.format_get_empty_dense_vector! -
DataAxesFormats.Formats.format_get_empty_sparse_matrix! -
DataAxesFormats.Formats.format_get_empty_sparse_vector! -
DataAxesFormats.Formats.format_get_matrix -
DataAxesFormats.Formats.format_get_scalar -
DataAxesFormats.Formats.format_get_vector -
DataAxesFormats.Formats.format_has_axis -
DataAxesFormats.Formats.format_has_matrix -
DataAxesFormats.Formats.format_has_scalar -
DataAxesFormats.Formats.format_has_vector -
DataAxesFormats.Formats.format_matrices_set -
DataAxesFormats.Formats.format_relayout_matrix! -
DataAxesFormats.Formats.format_scalars_set -
DataAxesFormats.Formats.format_set_matrix! -
DataAxesFormats.Formats.format_set_scalar! -
DataAxesFormats.Formats.format_set_vector! -
DataAxesFormats.Formats.format_vectors_set