Views
DataAxesFormats.Views
—
Module
Create a different view of
Daf
data using queries. This is a very flexible mechanism which can be used for a variety of use cases. A simple way of using this is to view a subset of the data as a
Daf
data set. A variant of this also renames the data properties to adapt them to the requirements of some computation. This makes it simpler to create such tools (using fixed, generic property names) and apply them to arbitrary data (with arbitrary specific property names).
DataAxesFormats.Views.DafView
—
Type
struct DafView(daf::DafReader) <: DafReader
A read-only wrapper for any
DafReader
data, which exposes an arbitrary view of it as another
DafReadOnly
. This isn't typically created manually; instead call
viewer
.
DataAxesFormats.Views.viewer
—
Function
viewer(
daf::DafReader;
[name::Maybe{AbstractString} = nothing,
axes::Maybe{ViewAxes} = nothing,
data::Maybe{ViewData} = nothing]
)::DafReadOnly
Wrap
daf
data with a read-only
DafView
. The exposed view is defined by a set of queries applied to the original data. These queries are evaluated only when data is actually accessed. Therefore, creating a view is a relatively cheap operation.
If the
name
is not specified, the result name will be based on the name of
daf
, with a
.view
suffix.
Queries are listed separately for axes and data.
As an optimization, calling
viewer
with all-empty (default) arguments returns a simple
DafReadOnlyWrapper
, that is, it is equivalent to calling
read_only
.
DataAxesFormats.Views.ViewAxis
—
Type
Specify an axis to expose from a view.
This is a pair (similar to initializing a
Dict
). The key is the name of the axis in the view and the value is the query describing how to compute it from the base repository. We also allow using a tuple to to make it easy to invoke the API from other languages such as Python which do not have the concept of a
Pair
.
If the value is
nothing
, then the axis will
not
be exposed by the view. If the value is
"="
, then the axis will be exposed with the same entries as in the original
daf
data. If the value is a name it is interpreted as if it is an axis name (that is,
"obs" => "cell"
is the same as
"obs" => q"@ cell"
). Otherwise the query should be a valid axis query. For example, saying
"batch" => q"@ batch [ age > 1 ]
will expose the
batch
axis, but only including the batches whose
age
property is greater than 1.
If the key is
"*"
, then it is replaced by all the names of the axes of the wrapped
daf
data. The only valid queries in this case are
nothing
to hide all the axes or
=
to expose all the axes. The latter is often used as the first pair, followed by additional ones to hide or override specific axes.
DataAxesFormats.Views.ViewAxes
—
Type
Specify all the axes to expose from a view. The order of the pairs (or tuples) matters - the last one wins. We would have liked to specify this as
AbstractVector{<:ViewAxis}
but Julia in its infinite wisdom considers does not allow
Pair{String, String}
to be a subtype of
Pair{AbstractString, AbstractString}
.
DataAxesFormats.Views.ViewDatum
—
Type
Specify a single datum to expose from view.
Scalars
are specified similar to
ViewAxis
, except that a
"*"
key expands to all the scalars in the base repository and a simple name query is interpreted as a scalar name (that is,
"quality" => "score"
is the same as
"quality" => q". score"
). In general the query should give a scalar result, for example
"total_umis" => q"@ cell @ gene :: UMIs >> Sum"
will expose a
total_umis
scalar containing the total sum of all UMIs of all genes in all cells.
Vectors
are specified similarly to scalars, but require a tuple key specifying both an axis and a property name. The axis must be exposed by the view (based on the
axes
parameter). If the axis is
"*"
, it is replaces by all the exposed axis names specified by the
axes
parameter. Similarly, if the property name is
"*"
(e.g.,
("gene", "*")
), then it is replaced by all the vector properties of the exposed axis in the base data. Therefore specifying
("*", "*")
(or
ALL_VECTORS
)`, all vector properties of all the (exposed) axes will also be exposed.
The value for vectors must be the suffix of a vector query based on the appropriate axis. For example,
("cell", "color") => ": type : color"
will expose a vector of color for each exposed cell, which is the color of the type of the cell, even if the exposed cell axis is a subset of the original cell axis.
However, if the query starts with an axis operator, then it should be a complete query. This may require repeating the axis query in it; as a convenience, a axis operator with the special name
__axis__
is replaced by the axis query. For example, suppose the cell axis is defined as
"cell" => "@ cell [ type = TCell ]"
, then we could expose a vector of the total UMIs for each cell by saying
"cell", "total_UMIs") => "@ gene @ __axis__ :: UMIs >- Sum"
, which would be expanded to
@ gene @ cell [ type = TCell ] :: UMIs >- Sum"
to compute the total UMIs only for the exposed cells.
Matrices
require a tuple key specifying both axes and a property name. The axes must both be exposed by the view (based on the
axes
parameter). Again if any or both of the axes are
"*"
, they are replaced by all the exposed axes (based on the
axes
parameter), and likewise if the name is
"*"
, it replaced by all the matrix properties of the axes. Normally the query is prefixed by the rows and columns axes queries, unless the query starts with an axis operator. To avoid having to repeat the axes queries in this case, saying
@ __rows_axis__
will expand to the query of the rows axis and
@ __columns_axis__
will expand to the query of the columns axis.
3D Tensors
require a tuple key specifying the main axis, followed by two axes, and a property name. All the axes must be exposed by the view (based on the
axes
parameter). In this cases, none of the axes may be
"*"
, and the value can only be be
"="
to expose all the matrix properties of the tensor as they are or
nothing
to hide all of them; that is, views can expose or hide existing (possibly masked) 3D tensors, but can't be used to create new ones.
That is, assuming a
gene
,
cell
and
batch
axes were exposed by the
axes
parameters, then specifying that
("batch", "cell", "gene", "is_measured") => "="
will expose the set of per-cell-per-gene matrices
batch1_is_measured
,
batch2_is_measured
, etc.
DataAxesFormats.Views.ViewData
—
Type
Specify all the data to expose from a view. The order of the pairs (or tuples) matters - the last one wins. However,
TensorKey
s are interpreted after interpreting all
MatrixKey
s, so they will override them even if they appear earlier in the list of keys. For clarity it is best to list them at the very end of the list.
We would have liked to specify this as
AbstractVector{<:ViewDatum}
but Julia in its infinite wisdom considers does not allow
Pair{String, String}
to be a subtype of
Pair{AbstractString, AbstractString}
.
DataAxesFormats.Views.ALL_SCALARS
—
Constant
A key to use in the
data
parameter of
viewer
to specify all the base data scalars.
DataAxesFormats.Views.VIEW_ALL_SCALARS
—
Constant
A pair to use in the
data
parameter of
viewer
to specify all the base data scalars.
DataAxesFormats.Views.ALL_AXES
—
Constant
A key to use in the
axes
parameter of
viewer
to specify all the base data axes.
DataAxesFormats.Views.VIEW_ALL_AXES
—
Constant
A pair to use in the
axes
parameter of
viewer
to specify all the base data axes. This is the default, so the only reason do this is to say
[VIEW_ALL_AXES, ...]
- that is, follow it by some modifications.
DataAxesFormats.Views.ALL_VECTORS
—
Constant
A key to use in the
data
parameter of
viewer
to specify all the vectors of the exposed axes.
DataAxesFormats.Views.VIEW_ALL_VECTORS
—
Constant
A pair to use in the
data
parameter of
viewer
to specify all the vectors of the exposed axes.
DataAxesFormats.Views.ALL_MATRICES
—
Constant
A key to use in the
data
parameter of
viewer
to specify all the matrices of the exposed axes.
DataAxesFormats.Views.VIEW_ALL_MATRICES
—
Constant
A pair to use in the
data
parameter of
viewer
to specify all the matrices of the exposed axes.
DataAxesFormats.Views.VIEW_ALL_DATA
—
Constant
A vector to use in the
data
parameters of
viewer
to specify the view exposes all the data of the exposed axes. This is the default, so the only reason do this is to say
[VIEW_ALL_DATA..., ...]
- that is, follow it by some modifications.
Index
-
DataAxesFormats.Views -
DataAxesFormats.Views.ALL_AXES -
DataAxesFormats.Views.ALL_MATRICES -
DataAxesFormats.Views.ALL_SCALARS -
DataAxesFormats.Views.ALL_VECTORS -
DataAxesFormats.Views.VIEW_ALL_AXES -
DataAxesFormats.Views.VIEW_ALL_DATA -
DataAxesFormats.Views.VIEW_ALL_MATRICES -
DataAxesFormats.Views.VIEW_ALL_SCALARS -
DataAxesFormats.Views.VIEW_ALL_VECTORS -
DataAxesFormats.Views.DafView -
DataAxesFormats.Views.ViewAxes -
DataAxesFormats.Views.ViewAxis -
DataAxesFormats.Views.ViewData -
DataAxesFormats.Views.ViewDatum -
DataAxesFormats.Views.viewer