Views

DataAxesFormats.Views Module

Create a different view of Daf data using queries. This is a very flexible mechanism which can be used for a variety of use cases. A simple way of using this is to view a subset of the data as a Daf data set. A variant of this also renames the data properties to adapt them to the requirements of some computation. This makes it simpler to create such tools (using fixed, generic property names) and apply them to arbitrary data (with arbitrary specific property names).

DataAxesFormats.Views.viewer Function
viewer(
    daf::DafReader;
    [name::Maybe{AbstractString} = nothing,
    axes::Maybe{ViewAxes} = nothing,
    data::Maybe{ViewData} = nothing]
)::DafReadOnly

Wrap daf data with a read-only DafView . The exposed view is defined by a set of queries applied to the original data. These queries are evaluated only when data is actually accessed. Therefore, creating a view is a relatively cheap operation.

If the name is not specified, the result name will be based on the name of daf , with a .view suffix.

Queries are listed separately for axes and data.

Note

As an optimization, calling viewer with all-empty (default) arguments returns a simple DafReadOnlyWrapper , that is, it is equivalent to calling read_only .

DataAxesFormats.Views.ViewAxis Type

Specify an axis to expose from a view.

This is a pair (similar to initializing a Dict ). The key is the name of the axis in the view and the value is the query describing how to compute it from the base repository. We also allow using a tuple to to make it easy to invoke the API from other languages such as Python which do not have the concept of a Pair .

If the value is nothing , then the axis will not be exposed by the view. If the value is "=" , then the axis will be exposed with the same entries as in the original daf data. If the value is a name it is interpreted as if it is an axis name (that is, "obs" => "cell" is the same as "obs" => q"@ cell" ). Otherwise the query should be a valid axis query. For example, saying "batch" => q"@ batch [ age > 1 ] will expose the batch axis, but only including the batches whose age property is greater than 1.

If the key is "*" , then it is replaced by all the names of the axes of the wrapped daf data. The only valid queries in this case are nothing to hide all the axes or = to expose all the axes. The latter is often used as the first pair, followed by additional ones to hide or override specific axes.

DataAxesFormats.Views.ViewAxes Type

Specify all the axes to expose from a view. The order of the pairs (or tuples) matters - the last one wins. We would have liked to specify this as AbstractVector{<:ViewAxis} but Julia in its infinite wisdom considers does not allow Pair{String, String} to be a subtype of Pair{AbstractString, AbstractString} .

DataAxesFormats.Views.ViewDatum Type

Specify a single datum to expose from view.

Scalars are specified similar to ViewAxis , except that a "*" key expands to all the scalars in the base repository and a simple name query is interpreted as a scalar name (that is, "quality" => "score" is the same as "quality" => q". score" ). In general the query should give a scalar result, for example "total_umis" => q"@ cell @ gene :: UMIs >> Sum" will expose a total_umis scalar containing the total sum of all UMIs of all genes in all cells.

Vectors are specified similarly to scalars, but require a tuple key specifying both an axis and a property name. The axis must be exposed by the view (based on the axes parameter). If the axis is "*" , it is replaces by all the exposed axis names specified by the axes parameter. Similarly, if the property name is "*" (e.g., ("gene", "*") ), then it is replaced by all the vector properties of the exposed axis in the base data. Therefore specifying ("*", "*") (or ALL_VECTORS )`, all vector properties of all the (exposed) axes will also be exposed.

The value for vectors must be the suffix of a vector query based on the appropriate axis. For example, ("cell", "color") => ": type : color" will expose a vector of color for each exposed cell, which is the color of the type of the cell, even if the exposed cell axis is a subset of the original cell axis.

However, if the query starts with an axis operator, then it should be a complete query. This may require repeating the axis query in it; as a convenience, a axis operator with the special name __axis__ is replaced by the axis query. For example, suppose the cell axis is defined as "cell" => "@ cell [ type = TCell ]" , then we could expose a vector of the total UMIs for each cell by saying "cell", "total_UMIs") => "@ gene @ __axis__ :: UMIs >- Sum" , which would be expanded to @ gene @ cell [ type = TCell ] :: UMIs >- Sum" to compute the total UMIs only for the exposed cells.

Matrices require a tuple key specifying both axes and a property name. The axes must both be exposed by the view (based on the axes parameter). Again if any or both of the axes are "*" , they are replaced by all the exposed axes (based on the axes parameter), and likewise if the name is "*" , it replaced by all the matrix properties of the axes. Normally the query is prefixed by the rows and columns axes queries, unless the query starts with an axis operator. To avoid having to repeat the axes queries in this case, saying @ __rows_axis__ will expand to the query of the rows axis and @ __columns_axis__ will expand to the query of the columns axis.

3D Tensors require a tuple key specifying the main axis, followed by two axes, and a property name. All the axes must be exposed by the view (based on the axes parameter). In this cases, none of the axes may be "*" , and the value can only be be "=" to expose all the matrix properties of the tensor as they are or nothing to hide all of them; that is, views can expose or hide existing (possibly masked) 3D tensors, but can't be used to create new ones.

That is, assuming a gene , cell and batch axes were exposed by the axes parameters, then specifying that ("batch", "cell", "gene", "is_measured") => "=" will expose the set of per-cell-per-gene matrices batch1_is_measured , batch2_is_measured , etc.

DataAxesFormats.Views.ViewData Type

Specify all the data to expose from a view. The order of the pairs (or tuples) matters - the last one wins. However, TensorKey s are interpreted after interpreting all MatrixKey s, so they will override them even if they appear earlier in the list of keys. For clarity it is best to list them at the very end of the list.

We would have liked to specify this as AbstractVector{<:ViewDatum} but Julia in its infinite wisdom considers does not allow Pair{String, String} to be a subtype of Pair{AbstractString, AbstractString} .

DataAxesFormats.Views.VIEW_ALL_AXES Constant

A pair to use in the axes parameter of viewer to specify all the base data axes. This is the default, so the only reason do this is to say [VIEW_ALL_AXES, ...] - that is, follow it by some modifications.

DataAxesFormats.Views.VIEW_ALL_DATA Constant

A vector to use in the data parameters of viewer to specify the view exposes all the data of the exposed axes. This is the default, so the only reason do this is to say [VIEW_ALL_DATA..., ...] - that is, follow it by some modifications.

Index