Writers

DataAxesFormats.Writers Module

The DafWriter interface specify a high-level API for writing Daf data. This API is implemented here, on top of the low-level FormatWriter API. This is an extension of the DafReader API and provides provides thread safety for reading and writing to the same data set from multiple threads, so the low-level API can (mostly) ignore this issue.

Scalar properties

DataAxesFormats.Writers.set_scalar! Function
set_scalar!(
    daf::DafWriter,
    name::AbstractString,
    value::StorageScalar;
    [overwrite::Bool = false]
)::Nothing

Set the value of a scalar property with some name in daf .

If not overwrite (the default), this first verifies the name scalar property does not exist.

cells = example_cells_daf()
set_scalar!(cells, "version", 1.0)
println(get_scalar(cells, "version"))
set_scalar!(cells, "version", 2.0; overwrite = true)
println(get_scalar(cells, "version"))

# output

1.0
2.0

DataAxesFormats.Writers.delete_scalar! Function
delete_scalar!(
    daf::DafWriter,
    name::AbstractString;
    must_exist::Bool = true,
)::Nothing

Delete a scalar property with some name from daf .

If must_exist (the default), this first verifies the name scalar property exists in daf .

cells = example_cells_daf()
println(has_scalar(cells, "organism"))
delete_scalar!(cells, "organism")
println(has_scalar(cells, "organism"))

# output

true
false

Writers axes

DataAxesFormats.Writers.add_axis! Function
add_axis!(
    daf::DafWriter,
    axis::AbstractString,
    entries::AbstractVector{<:AbstractString};
    overwrite::Bool = false,
)::Nothing

Add a new axis to daf .

This verifies the entries are unique. If overwrite , this will first delete an existing axis with the same name (which will also delete any data associated with this axis!). Otherwise, this verifies the the axis does not exist.

metacells = example_cells_daf()
println(has_axis(metacells, "block"))
add_axis!(metacells, "block", ["B1", "B2"])
println(has_axis(metacells, "block"))

# output

false
true

DataAxesFormats.Writers.delete_axis! Function
delete_axis!(
    daf::DafWriter,
    axis::AbstractString;
    must_exist::Bool = true,
)::Nothing

Delete an axis from the daf . This will also delete any vector or matrix properties that are based on this axis.

If must_exist (the default), this first verifies the axis exists in the daf .

metacells = example_metacells_daf()
println(has_axis(metacells, "type"))
delete_axis!(metacells, "type")
println(has_axis(metacells, "type"))

# output

true
false

Vector properties

DataAxesFormats.Writers.set_vector! Function
set_vector!(
    daf::DafWriter,
    axis::AbstractString,
    name::AbstractString,
    vector::Union{StorageScalar, StorageVector};
    [eltype::Maybe{Type{<:StorageReal}} = nothing,
    overwrite::Bool = false]
)::Nothing

Set a vector property with some name for some axis in daf .

If the vector specified is actually a StorageScalar , the stored vector is filled with this value.

This first verifies the axis exists in daf , that the property name isn't name , and that the vector has the appropriate length. If not overwrite (the default), this also verifies the name vector does not exist for the axis .

If eltype is specified, and the data is of another type, then the data is converted to this data type before being stored.

metacells = example_metacells_daf()
println(has_vector(metacells, "type", "is_mebemp"))
set_vector!(metacells, "type", "is_mebemp", [true, true, false, false])
println(has_vector(metacells, "type", "is_mebemp"))
set_vector!(metacells, "type", "is_mebemp", [true, true, true, false]; overwrite = true)
println(has_vector(metacells, "type", "is_mebemp"))

# output

false
true
true

DataAxesFormats.Writers.delete_vector! Function
delete_vector!(
    daf::DafWriter,
    axis::AbstractString,
    name::AbstractString;
    must_exist::Bool = true,
)::Nothing

Delete a vector property with some name for some axis from daf .

This first verifies the axis exists in daf and that the property name isn't name . If must_exist (the default), this also verifies the name vector exists for the axis .

metacells = example_metacells_daf()
println(has_vector(metacells, "type", "color"))
delete_vector!(metacells, "type", "color")
println(has_vector(metacells, "type", "color"))

# output

true
false

Matrix properties

DataAxesFormats.Writers.set_matrix! Function
set_matrix!(
    daf::DafWriter,
    rows_axis::AbstractString,
    columns_axis::AbstractString,
    name::AbstractString,
    matrix::Union{StorageScalarBase, StorageMatrix};
    [eltype::Maybe{Type{<:StorageScalarBase}} = nothing,
    overwrite::Bool = false,
    relayout::Bool = true]
)::Nothing

Set the matrix property with some name for some rows_axis and columns_axis in daf . Since this is Julia, this should be a column-major matrix .

If the matrix specified is actually a StorageScalar , the stored matrix is filled with this value.

If relayout (the default), this will also automatically relayout! the matrix and store the result, so the data would also be stored in row-major layout (that is, with the axes flipped), similarly to calling relayout! .

This first verifies the rows_axis and columns_axis exist in daf , that the matrix is column-major of the appropriate size. If not overwrite (the default), this also verifies the name matrix does not exist for the rows_axis and columns_axis .

metacells = example_metacells_daf()
println(has_matrix(metacells, "gene", "metacell", "confidence"))
println(has_matrix(metacells, "gene", "metacell", "confidence"; relayout = false))
println(has_matrix(metacells, "metacell", "gene", "confidence"; relayout = false))

set_matrix!(metacells, "metacell", "gene", "confidence", rand(7, 683); relayout = false)
println()
println(has_matrix(metacells, "gene", "metacell", "confidence"))
println(has_matrix(metacells, "gene", "metacell", "confidence"; relayout = false))
println(has_matrix(metacells, "metacell", "gene", "confidence"; relayout = false))

set_matrix!(metacells, "metacell", "gene", "confidence", rand(7, 683); overwrite = true)
println()
println(has_matrix(metacells, "gene", "metacell", "confidence"))
println(has_matrix(metacells, "gene", "metacell", "confidence"; relayout = false))
println(has_matrix(metacells, "metacell", "gene", "confidence"; relayout = false))

# output

false
false
false

true
false
true

true
true
true

DataAxesFormats.Writers.relayout_matrix! Function
relayout_matrix!(
    daf::DafWriter,
    rows_axis::AbstractString,
    columns_axis::AbstractString,
    name::AbstractString;
    [overwrite::Bool = false]
)::Nothing

Given a matrix property with some name exists (in column-major layout) in daf for the rows_axis and the columns_axis , then relayout! it and store the row-major result as well (that is, with flipped axes).

This is useful following calling empty_dense_matrix! or empty_sparse_matrix! to ensure both layouts of the matrix are stored in def . When calling set_matrix! , it is simpler to just specify (the default) relayout = true .

This first verifies the rows_axis and columns_axis exist in daf , and that there is a name (column-major) matrix property for them. If not overwrite (the default), this also verifies the name matrix does not exist for the flipped rows_axis and columns_axis .

Note

A restriction of the way Daf stores data is that square data is only stored in one (column-major) layout (e.g., to store a weighted directed graph between cells, you may store an outgoing weights matrix where each cell's column holds the outgoing weights from the cell to the other cells. In this case you can't ask Daf to relayout the matrix to row-major order so that each cell's row would be the incoming weights from the other cells. Instead you would need to explicitly store a separate incoming weights matrix where each cell's column holds the incoming weights).

DataAxesFormats.Writers.delete_matrix! Function
delete_matrix!(
    daf::DafWriter,
    rows_axis::AbstractString,
    columns_axis::AbstractString,
    name::AbstractString;
    [must_exist::Bool = true,
    relayout::Bool = true]
)::Nothing

Delete a matrix property with some name for some rows_axis and columns_axis from daf .

If relayout (the default), this will also delete the matrix in the other layout (that is, with flipped axes).

This first verifies the rows_axis and columns_axis exist in daf . If must_exist (the default), this also verifies the name matrix exists for the rows_axis and columns_axis .

cells = example_cells_daf()
println(has_matrix(cells, "gene", "cell", "UMIs"))
println(has_matrix(cells, "gene", "cell", "UMIs"; relayout = false))
println(has_matrix(cells, "cell", "gene", "UMIs"; relayout = false))

delete_matrix!(cells, "gene", "cell", "UMIs"; relayout = false)
println()
println(has_matrix(cells, "gene", "cell", "UMIs"))
println(has_matrix(cells, "gene", "cell", "UMIs"; relayout = false))
println(has_matrix(cells, "cell", "gene", "UMIs"; relayout = false))

delete_matrix!(cells, "gene", "cell", "UMIs"; must_exist = false)
println()
println(has_matrix(cells, "gene", "cell", "UMIs"))
println(has_matrix(cells, "gene", "cell", "UMIs"; relayout = false))
println(has_matrix(cells, "cell", "gene", "UMIs"; relayout = false))

# output

true
true
true

true
false
true

false
false
false

Creating properties

DataAxesFormats.Writers.empty_dense_vector! Function
empty_dense_vector!(
    fill::Function,
    daf::DafWriter,
    axis::AbstractString,
    name::AbstractString,
    eltype::Type{<:StorageReal};
    [overwrite::Bool = false]
)::Any

Create an empty dense vector property with some name for some axis in daf , pass it to fill , and return the result.

The returned vector will be uninitialized; the caller is expected to fill it with values. This saves creating a copy of the vector before setting it in the data, which makes a huge difference when creating vectors on disk (using memory mapping). For this reason, this does not work for strings, as they do not have a fixed size.

This first verifies the axis exists in daf and that the property name isn't name . If not overwrite (the default), this also verifies the name vector does not exist for the axis .

DataAxesFormats.Writers.empty_sparse_vector! Function
empty_sparse_vector!(
    fill::Function,
    daf::DafWriter,
    axis::AbstractString,
    name::AbstractString,
    eltype::Type{<:StorageReal},
    nnz::StorageInteger,
    indtype::Maybe{Type{<:StorageInteger}} = nothing;
    [overwrite::Bool = false]
)::Any

Create an empty sparse vector property with some name for some axis in daf , pass its parts ( nzind and nzval ) to fill , and return the result.

If indtype is not specified, it is chosen automatically to be the smallest unsigned integer type needed for the vector.

The returned vector will be uninitialized; the caller is expected to fill its nzind and nzval vectors with values. Specifying the nnz makes their sizes known in advance, to allow pre-allocating disk data. For this reason, this does not work for strings, as they do not have a fixed size.

This severely restricts the usefulness of this function, because typically nnz is only know after fully computing the matrix. Still, in some cases a large sparse vector is created by concatenating several smaller ones; this function allows doing so directly into the data vector, avoiding a copy in case of memory-mapped disk formats.

Warning

It is the caller's responsibility to fill the two vectors with valid data. Specifically, you must ensure:

  • nzind[1] == 1
  • nzind[i] <= nzind[i + 1]
  • nzind[end] == nnz

This first verifies the axis exists in daf and that the property name isn't name . If not overwrite (the default), this also verifies the name vector does not exist for the axis .

DataAxesFormats.Writers.empty_dense_matrix! Function
empty_dense_matrix!(
    fill::Function,
    daf::DafWriter,
    rows_axis::AbstractString,
    columns_axis::AbstractString,
    name::AbstractString,
    eltype::Type{<:StorageReal};
    [overwrite::Bool = false]
)::Any

Create an empty dense matrix property with some name for some rows_axis and columns_axis in daf , pass it to fill , and return the result. Since this is Julia, this will be a column-major matrix .

The returned matrix will be uninitialized; the caller is expected to fill it with values. This saves creating a copy of the matrix before setting it in daf , which makes a huge difference when creating matrices on disk (using memory mapping). For this reason, this does not work for strings, as they do not have a fixed size.

This first verifies the rows_axis and columns_axis exist in daf , that the matrix is column-major of the appropriate size. If not overwrite (the default), this also verifies the name matrix does not exist for the rows_axis and columns_axis .

DataAxesFormats.Writers.empty_sparse_matrix! Function
empty_sparse_matrix!(
    fill::Function,
    daf::DafWriter,
    rows_axis::AbstractString,
    columns_axis::AbstractString,
    name::AbstractString,
    eltype::Type{<:StorageReal},
    nnz::StorageInteger,
    intdype::Maybe{Type{<:StorageInteger}} = nothing;
    [overwrite::Bool = false]
)::Any

Create an empty sparse matrix property with some name for some rows_axis and columns_axis in daf , pass its parts ( colptr , rowval and nzval ) to fill , and return the result.

If indtype is not specified, it is chosen automatically to be the smallest unsigned integer type needed for the matrix.

The returned matrix will be uninitialized; the caller is expected to fill its colptr , rowval and nzval vectors. Specifying the nnz makes their sizes known in advance, to allow pre-allocating disk space. For this reason, this does not work for strings, as they do not have a fixed size.

This severely restricts the usefulness of this function, because typically nnz is only know after fully computing the matrix. Still, in some cases a large sparse matrix is created by concatenating several smaller ones; this function allows doing so directly into the data, avoiding a copy in case of memory-mapped disk formats.

Warning

It is the caller's responsibility to fill the three vectors with valid data. Specifically, you must ensure:

  • colptr[1] == 1
  • colptr[end] == nnz + 1
  • colptr[i] <= colptr[i + 1]
  • for all j , for all i such that colptr[j] <= i and i + 1 < colptr[j + 1] , 1 <= rowptr[i] < rowptr[i + 1] <= nrows

This first verifies the rows_axis and columns_axis exist in daf . If not overwrite (the default), this also verifies the name matrix does not exist for the rows_axis and columns_axis .

Index