Queries

Construction

DataAxesFormats.Queries.Query Type

A query is a description of a (subset of a) procedure for extracting some data from a DafReader . A full query is a sequence of QueryOperation , that when applied to some DafReader , result in a set of names, or scalar, vector or matrix result.

Queries can be constructed in two ways. In code, a query can be built by chaining query operations (e.g., the query Axis("gene") |> LookupVector("is_marker") looks up the is_marker vector property of the gene axis). Alternatively, a query can be parsed from a string, which needs to be parsed into a Query object (e.g., the above can be written as parse_query("@gene:is_marker") or using the @q_str macro as q"gene:is_marker" ).

Being able to represent queries as strings allows for reading them from configuration files and letting the user input them in an application UI (e.g., allowing the user to specify the X, Y and/or colors of a scatter plot using queries). At the same time, being able to incrementally build queries using code allows for convenient reuse (e.g., reusing axis sub-queries in Daf views), without having to go through the string representation.

To apply a query, invoke get_query to apply a query to some DafReader data (you can also use the shorthand daf[query] instead of get_query(daf, query) . Tou can also write query |> get_query(daf) which is useful when constructing a query from parts using |> ). By default, get_query will cache their results in memory as QueryData , to speed up repeated queries. This may lock up large amounts of memory. Using daf[query] does not cache the results; you can also use empty_cache! to release the memory.

Note

This has started as a very simple query language (which it still is, for the simple cases) but became complex to allow for useful but complex scenarios. In particular, the approach here of using a concatenative language (similar to ggplot ) makes simple things simpler, but became less natural for some of the more advanced operations. However, using an RPN or a LISP notation to better support such cases would have ended up with a much less nice syntax for the simple cases.

Hopefully we have covered sufficient ground so that we won't need to add further operations (except for more element-wise and reduction operations). In most cases, you can write code that accesses the vectors/matrix data and performs whatever computation you want instead of writing a complex query; however, this isn't an option when defining views or adapters, which rely on the query mechanism for specifying the data.

Execution Model

Queries consist of a combination of one or more of the operators listed below. However, the execution of the query is not one operator at a time. Instead, at each point, a phrase consisting of several operators is executed as a single operation. Each such step modifies the state of the query (starting with an empty state). When the query is done, the result is extracted from the final query state.

The query state is a stack which starts empty. Each phrase only applies if the top of the stack matches some pattern (e.g., looking up a vector property requires the top of the stack contains an axis specification). The execution of the phrase pops out the matching top stack elements, performs some operations on them, and then pushes some elements to the stack.

This approach simplifies both the code and the mental model for the query language. For example, when looking up a scalar property using the LookupScalar operator, e.g. ". version" , and we want to provide a default value to return if the property doesn't exist by following it with the IfMissing operator, e.g. " || 0.0.0", the phrase LookupScalar("version") |> IfMissing("0.0.0") is executed as a single operation, invoking get_scalar(daf, "version"; default = "0.0.0") and pushing a scalar into the query state stack. This eliminates the issue of "what is the state of the query after executing a LookupScalar of a missing scalar property, before executing IfMissing`".

A disadvantage of this approach is that the semantics of an operator depends on the phrase it is used in. However, we defined the operators such that they would "make sense" in the context of the different phrases they participate in. This allows us to provide a list of operators with a coherent function for each:

Query Operators

Operator Implementation Description
@ Axis Specify an axis, e.g. for looking up a vector or matrix property.
=@ AsAxis Specify that values are axis entries, e.g. for looking up another vector or matrix property.
@❘ SquareColumnIs Specify which column to slice from a square matrix.
@- SquareRowIs Specify which row to slice from a square matrix.
/ GroupBy Group elements of a vector by values of another vector of the same length.
❘/ GroupColumnsBy Group columns of a matrix by values of a vector with one value per row.
-/ GroupRowsBy Group rows of a matrix by values of a vector with one value per row.
% EltwiseOperation Specify an element-wise operation to apply to scalar, vector or matrix data.
>> ReductionOperation Specify a reduction operation to convert vector or matrix data to a single scalar value.
>❘ ReduceToColumn Specify a reduction operation to convert matrix data to a single column.
>- ReduceToRow Specify a reduction operation to convert matrix data to a single row.
❘❘ IfMissing Specify a default value to use when looking up a property that doesn't exist,
or when reducing an empty vector or matrix into a single scalar value.
* CountBy Count in a matrix the number of times each combination of values from two vectors coincide.
? Names Ask for a set of names of axes or properties that can be used to look up data.
?? IfNot Specify a final value to use when performing chained lookup operations based on an empty value.
. LookupScalar Lookup a scalar property.
: LookupVector Lookup a vector property based on some axis.
:: LookupMatrix Lookup a matrix property based on a pair of axes (rows and columns).
< IsLess Compare less than a value.
<= IsLess Compare less than or equal to a value.
= IsEqual Compare equal to a value.
!= IsNotEqual Compare not equal to a value.
>= IsLess Compare greater than or equal to a value.
> IsLess Compare greater than a value.
~ IsMatch Compare by matching to a regular expression.
!~ IsNotMatch Compare by not matching to a regular expression.
[ BeginMask Begin computing a mask on an axis.
[ ! BeginNegatedMask Begin computing a mask on an axis, negating it.
] EndMask Complete computing a mask on an axis.
& AndMask Merge masks by AND Boolean operation.
& ! AndNegatedMask Merge masks by AND NOT Boolean operation.
OrMask Merge masks by OR Boolean operation.
❘ ! OrNegatedMask Merge masks by OR NOT Boolean operation.
^ XorMask Merge masks by XOR Boolean operation.
^ ! XorNegatedMask Merge masks by XOR NOT Boolean operation.
Note

Due to Julia's Documenter limitations, the ASCII | character ( &#124; , vertical bar) is replaced by the Unicode character ( &#x2758; , light vertical bar) in the above table. Sigh.

Query Syntax

Obviously not all possible combinations of operators make sense (e.g., LookupScalar("is_marker") |> Axis("cell") will not work). Valid queries are built out of supported phrases (each including one or more operators), combined into a coherent query. For the full list of valid phrases and queries, see NAMES_QUERY , SCALAR_QUERY , VECTOR_QUERY and MATRIX_QUERY below.

DataAxesFormats.Queries.QueryString Type

Most operations that take a query allow passing a string to be parsed into a query, or an actual Query object. This type is used as a convenient notation for such query parameters.

DataAxesFormats.Queries.parse_query Function
parse_query(
    query_string::AbstractString,
    operand_only::Maybe{Type{<:QueryOperation}} = nothing
)::QueryOperation

Parse a query (or a fragment of a query). If the query_string contains just a name, and operand_only was specified, then it is assumed this is the type of query operation.

If the provided query string contains only an operand, and operand_only is specified, it is used as the operator (i.e., parse_query("metacell") is an error, but parse_query("metacell", Axis) is the same as Axis("metacell") ). This is useful when providing suffix queries (e.g., for get_frame ).

DataAxesFormats.Queries.@q_str Macro
q"..."

Shorthand for parsing a literal string as a Query . This is equivalent to Query (raw"...") , that is, a \ can be placed in the string without escaping it (except for before a " ). This is very convenient for literal queries (e.g., q"@ cell = ATCG\:B1 : batch" == parse_query(raw"@ cell = ATCG\:B1 : batch") == parse_query("@ cell = ATCG\\:B1 : batch") == Axis("cell") |> IsEqual("ATCG:B1") |> LookupVector("batch")) .

println("@ cell = ATCG\\:B1 : batch")
println(q"@ cell = ATCG\:B1 : batch")

# output

@ cell = ATCG\:B1 : batch
@ cell = ATCG\:B1 : batch

Syntax

Each description of a part of the query syntax is accompanied by a diagram.

Legend

cluster_all LEGEND cluster_operation cluster_state States cluster_empty PHRASES query phrase string   Query() |> Phrase() |> Objects() OPS name of query operation function    PHRASES->OPS FINAL Query Result OPS->FINAL STACK Query State Stack OPS:nw->STACK STACK->PHRASES COMPONENT REFERENCE TO QUERY COMPONENT   STACK->COMPONENT:se EMPTY EMPTY->PHRASES EMPTY->COMPONENT E0 empty COMPONENT->FINAL COMPONENT:sw->STACK

DataAxesFormats.Queries.NAMES_QUERY Constant

A query returning a set of names. Valid phrases are:

  • Looking up the set of names of the scalar properties ( ? ). Example:
cells = example_cells_daf()
cells[". ?"]

# output

KeySet for a Dict{AbstractString, Union{Bool, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, AbstractString}} with 2 entries. Keys:
  "organism"
  "reference"

  • Looking up the set of names of the axes ( @ ? ). Example:
cells = example_cells_daf()
cells["@ ?"]

# output

KeySet for a Dict{AbstractString, AbstractVector{<:AbstractString}} with 4 entries. Keys:
  "gene"
  "experiment"
  "donor"
  "cell"

  • Looking up the set of names of the vector properties of an axis (e.g., @ cell : ? ).
cells = example_cells_daf()
cells["@ gene : ?"]

# output

KeySet for a Dict{AbstractString, AbstractVector{T} where T<:(Union{Bool, Float32, Float64, Int16, Int32, Int64, Int8, UInt16, UInt32, UInt64, UInt8, S} where S<:AbstractString)} with 1 entry. Keys:
  "is_lateral"

  • Looking up the set of names of the matrix properties of a pair of axes (e.g., @ cell @ gene :: ? ).
cells = example_cells_daf()
cells["@ cell @ gene :: ?"]

# output

Set{AbstractString} with 1 element:
  "UMIs"

Syntax diagram:

cluster_all NAMES EMPTY N1 @ rows_axis @ columns_axis :: ?   Axis(rows_axis) |> Axis(columns_axis) |> LookupMatrix() |> Names() EMPTY->N1:w N2 @ axis : ?   Axis(axis) |> LookupVector() |> Names() EMPTY->N2:w N3 @ ?   Axis() |> Names() EMPTY->N3:w N4 . ?   LookupScalar() |> Names() EMPTY->N4:w NamesState NamesState N1_ names of matrices  N1:e->N1_:w N1_:e->NamesState N2_ names of vectors  N2:e->N2_:w N2_:e->NamesState N3_ names of axes  N3:e->N3_:w N3_:e->NamesState N4_ names of scalars  N4:e->N4_:w N4_:e->NamesState

DataAxesFormats.Queries.SCALAR_QUERY Constant

A query returning a scalar result. Valid phrases are:

  • Looking up a scalar property ( . scalar-property , . scalar-property || default-value ).
  • Looking up a vector, and picking a specific entry in it ( : vector-property @ axis = entry , : vector-property || default-value @ axis = entry ).
  • Looking up a matrix, and picking a specific entry in it ( :: matrix-property @ rows-axis = row-entry @ columns-axis = column-entry , :: matrix-property || default-value @ rows-axis = row-entry @ columns-axis = column-entry ).

In addition, you can use EltwiseOperation and ReductionOperation :

  • Transform any scalar (...scalar... % Eltwise operation... ). Actually, we don't currently have any element-wise operations that apply to strings, but we can add some if useful.

  • Reduce any vector to a scalar (...vector... >> Reduction operation... ) - see VECTOR_QUERY . Example:

cells = example_cells_daf()
# Number of genes which are marked as lateral.
cells["@ gene : is_lateral >> Sum type Int64"]

# output

438

  • Reduce any matrix to a scalar (...matrix... >> Reduction operation... ) - see MATRIX_QUERY . Example:
cells = example_cells_daf()
# Total number of measured UMIs in the data.
cells["@ cell @ gene :: UMIs >> Sum type Int64"]

# output

1171936

Syntax diagram:

cluster_all SCALAR EMPTY S5 : vector_property || default_value @ axis = value   LookupVector(vector_property) |> IfMissing(default_value) |> Axis(axis) |> IsEqual(value) EMPTY->S5:w S6 . scalar_property || default_value   LookupScalar(scalar_property) |> IfMissing(default_value) EMPTY->S6:w S4 :: matrix_property || default_value @ row_axis = row_value @ column_axis =column_value   LookupMatrix(matrix_property) |> IfMissing(default_value) |> Axis(rows_axis) |> IsEqual(row_value) |> Axis(columns_axis) |> IsEqual(column_value) EMPTY->S4:w VectorState VectorState S3 >> Reduction param value ... || default_value   Reduction( param = value, ... ) |> IfMissing(default_value) VectorState->S3:w MatrixState MatrixState S2 >> Reduction param value || default_value   Reduction( param = value, ... ) |> IfMissing(default_value) MatrixState->S2:w ScalarState ScalarState S1 % Eltwise param value ...   Eltwise( param = value, ... ) ScalarState->S1:w S1_ element-wise of scalar  S1:e->S1_:w S1_:se->ScalarState S2_ reduce matrix to scalar  S2:e->S2_:w S2_:e->ScalarState S3_ reduce vector to scalar  S3:e->S3_:w S3_:e->ScalarState S5_ lookup vector entry  S5:e->S5_:w S5_:e->ScalarState S6_ lookup scalar  S6:e->S6_:w S6_:e->ScalarState S4_ lookup matrix entry  S4:e->S4_:w S4_:e->ScalarState

DataAxesFormats.Queries.VECTOR_QUERY Constant

A query returning a vector result. Valid phrases are:

  • Looking up a vector axis ( @ axis ). This gives us a vector of the axis entries. Example:
cells = example_cells_daf()
cells["@ experiment"]

# output

23-element Named SparseArrays.ReadOnly{SubString{StringViews.StringView{Vector{UInt8}}}, 1, Vector{SubString{StringViews.StringView{Vector{UInt8}}}}}
experiment       │
─────────────────┼───────────────────
demux_01_02_21_1 │ "demux_01_02_21_1"
demux_01_02_21_2 │ "demux_01_02_21_2"
demux_01_03_21_1 │ "demux_01_03_21_1"
demux_04_01_21_1 │ "demux_04_01_21_1"
demux_04_01_21_2 │ "demux_04_01_21_2"
demux_07_03_21_1 │ "demux_07_03_21_1"
demux_07_03_21_2 │ "demux_07_03_21_2"
demux_07_12_20_1 │ "demux_07_12_20_1"
⋮                                   ⋮
demux_21_02_21_1 │ "demux_21_02_21_1"
demux_21_02_21_2 │ "demux_21_02_21_2"
demux_21_12_20_1 │ "demux_21_12_20_1"
demux_21_12_20_2 │ "demux_21_12_20_2"
demux_22_02_21_1 │ "demux_22_02_21_1"
demux_22_02_21_2 │ "demux_22_02_21_2"
demux_28_12_20_1 │ "demux_28_12_20_1"
demux_28_12_20_2 │ "demux_28_12_20_2"

  • Applying a mask to an axis (...axis... [ ...mask... ] ) - see VECTOR_MASK .
  • Looking up the values of a property based on a (possibly masked) axis (...axis... : ...lookup...) - see VECTOR_LOOKUP .
  • Applying some operation to a vector we looked up (...vector... % Eltwise operation... ) - see VECTOR_OPERATION .
  • Taking any matrix query and reducing it to a column or a row vector (...matrix... >| Reduction operation... , ...matrix... >- Reduction operation... ) - see VECTOR_FROM_MATRIX .

Syntax diagram:

cluster_all VECTOR EMPTY V1 @ axis   Axis(axis) EMPTY->V1 Matrix_values MatrixState values    V6 VECTOR FROM MATRIX Matrix_values->V6 Vector_axis VectorState axis    V2 VECTOR MASK Vector_axis->V2 V3 values are names implicit        Vector_axis->V3:w V4 VECTOR LOOKUP Vector_axis:se->V4:w Vector_values VectorState values    V5 VECTOR OPERATION Vector_values->V5 V1_ lookup axis  V1:e->V1_:w V1_:e->Vector_axis:w V2->Vector_axis V3:e->Vector_values V4:e->Vector_values V5->Vector_values V6->Vector_values

DataAxesFormats.Queries.VECTOR_OPERATION Constant

A query fragment specifying some operation on a vector of values. Valid phrases are:

  • Treating the vector values as names of some axis entries and looking up some property of that axis (...vector... @ axis-values-are-entries-of : vector-property-of-that-axis || default-value ) - see VECTOR_AS_AXIS and VECTOR_LOOKUP ).
metacells = example_metacells_daf()
metacells["@ metacell : type : color"]

# output

7-element Named Vector{String}
metacell  │
──────────┼────────────
M1671.28  │      "gold"
M2357.20  │      "gold"
M2169.56  │      "plum"
M2576.86  │   "#eebb6e"
M1440.15  │      "gold"
M756.63   │   "#eebb6e"
M412.08   │ "steelblue"

  • Applying some operation to a vector we looked up (...vector... % Eltwise ... ).
cells = example_cells_daf()
cells["@ donor : age % Clamp min 40 max 60 type Int64"]

# output

95-element Named Vector{Int64}
donor  │
───────┼───
N16    │ 60
N17    │ 60
N18    │ 60
N59    │ 60
N79    │ 60
N83    │ 42
N84    │ 60
N85    │ 60
⋮         ⋮
N176   │ 60
N177   │ 58
N178   │ 40
N179   │ 60
N181   │ 60
N182   │ 60
N183   │ 60
N184   │ 60

  • Comparing the values in the vector with some constant (...vector... > value ).
cells = example_cells_daf()
cells["@ donor : age > 60"]

# output

95-element Named Vector{Bool}
donor  │
───────┼──────
N16    │  true
N17    │  true
N18    │  true
N59    │  true
N79    │  true
N83    │ false
N84    │  true
N85    │  true
⋮            ⋮
N176   │  true
N177   │ false
N178   │ false
N179   │  true
N181   │  true
N182   │  true
N183   │  true
N184   │  true

  • Grouping the vector values by something and reducing each group to a single value (...vector... / vector-property >> Sum ) - see VECTOR_GROUP .

Syntax diagram:

cluster_all VECTOR OPERATION Vector_base_values VectorState base values    VO1 % Eltwise param value ...   Eltwise( param = value, ... ) Vector_base_values->VO1:w VO2 < or  <= or  = or  != or  >= or  > or  ~ or  !~ value     IsLess or IsLessEqual or   IsEqual or IsNotEqual or   IsGreaterEqual or IsGreater or   IsMatch or IsNotMatch   (value) Vector_base_values->VO2:w VO3_ VECTOR GROUP   Vector_base_values->VO3_:w VO4 VECTOR AS AXIS Vector_base_values->VO4:w VO5_ vector property is an axis implicit        Vector_base_values->VO5_:w Vector_final_values VectorState final values    VO1_ element-wise of vector   VO1:e->VO1_:w VO1_:e->Vector_final_values VO2_ compare vector   VO2:e->VO2_:w VO2_:e->Vector_final_values VO3_:e->Vector_final_values Vector_axis VectorState axis    VO4:e->Vector_axis VO5_:e->Vector_axis VO6_ VECTOR LOOKUP   VO6_:e->Vector_final_values Vector_axis->VO6_:w

DataAxesFormats.Queries.VECTOR_AS_AXIS Constant

A query fragment for explicitly specifying that the values or a vector are entries of an axis. Valid phrases are:

  • Using the name of the property of the vector as the axis name (...vector... @ ). The convention is that the property name is the name of the axis, or starts with the name of the axis followed by . and some suffix
  • Specifying an explicit axis name (...vector... @ axis ) ignoring the vector property name.

When the values of a vector are entries in some axis, we can use it to look up some property based on it. For simple lookups the @ can be omitted (e.g. @ cell : metacell ). This can be chained ( @ cell : metacell : type : color ). When grouping a vector or matrix rows or columns, explicitly associating an axis with the values causes creating a group for each axis entry in the right order so that the result is a proper values vector for the axis ( @ metacell / type @ >> Count ).

metacells = example_metacells_daf()
metacells["@ metacell : type =@ : color"]

# output

7-element Named Vector{String}
metacell  │
──────────┼────────────
M1671.28  │      "gold"
M2357.20  │      "gold"
M2169.56  │      "plum"
M2576.86  │   "#eebb6e"
M1440.15  │      "gold"
M756.63   │   "#eebb6e"
M412.08   │ "steelblue"

metacells = example_metacells_daf()
metacells["@ metacell : type =@ type : color"]

# output

7-element Named Vector{String}
metacell  │
──────────┼────────────
M1671.28  │      "gold"
M2357.20  │      "gold"
M2169.56  │      "plum"
M2576.86  │   "#eebb6e"
M1440.15  │      "gold"
M756.63   │   "#eebb6e"
M412.08   │ "steelblue"

Syntax diagram:

cluster_all VECTOR AS AXIS Vector_values VectorState values    VAA1 @ axis   Axis(axis) Vector_values->VAA1:w VAA2 @   Axis() Vector_values->VAA2:w Vector_axis VectorState axis    VAA1_ vector values are entries of axis    VAA1:e->VAA1_:w VAA1_:e->Vector_axis VAA2_ vector property is an axis    VAA2:e->VAA2_:w VAA2_:e->Vector_axis

DataAxesFormats.Queries.VECTOR_LOOKUP Constant

A query fragment specifying looking up vector properties. Valid phrases are:

  • Looking up a vector property based on an axis (...axis... : vector-property ). Example:
metacells = example_metacells_daf()
metacells["@ metacell : type"]

# output

7-element Named SparseArrays.ReadOnly{String, 1, Vector{String}}
metacell  │
──────────┼───────────
M1671.28  │      "MPP"
M2357.20  │      "MPP"
M2169.56  │ "MEBEMP-L"
M2576.86  │ "MEBEMP-E"
M1440.15  │      "MPP"
M756.63   │ "MEBEMP-E"
M412.08   │ "memory-B"

This can be further embellished:

  • Looking up a matrix property based on an axis, and slicing a column based on an explicit entry of the other axis of the matrix (...axis... :: matrix-property @ columns-axis = columns-axis-entry ). Example:
metacells = example_metacells_daf()
metacells["@ gene :: fraction @ metacell = M412.08"]

# output

683-element Named Vector{Float32}
gene         │
─────────────┼────────────
RPL22        │  0.00373581
PARK7        │  6.50531f-5
ENO1         │  4.22228f-5
PRDM2        │ 0.000151486
HP1BP3       │  0.00012099
CDC42        │ 0.000176377
HNRNPR       │   6.7083f-5
RPL11        │   0.0124251
⋮                        ⋮
NRIP1        │  2.79487f-5
ATP5PF       │  8.22312f-5
CCT8         │  4.13243f-5
SOD1         │ 0.000103708
SON          │  0.00032361
ATP5PO       │  9.73498f-5
TTC3         │ 0.000122469
HMGN1        │ 0.000160654

  • Looking up a square matrix property, and slicing a column based on an explicit entry of the (column) axis of the matrix (...axis... :: square-matrix-property @| column-axis-entry ).
metacells = example_metacells_daf()
# Outgoing weights from the M412.08 metacell.
metacells["@ metacell :: edge_weight @| M412.08"]

# output

7-element Named Vector{Float32}
metacell  │
──────────┼────
M1671.28  │ 0.0
M2357.20  │ 0.0
M2169.56  │ 0.0
M2576.86  │ 0.0
M1440.15  │ 0.5
M756.63   │ 0.1
M412.08   │ 0.0

  • Looking up a square matrix property, and slicing a row based on an explicit entry of the (column) axis of the matrix (...vector... :: square-matrix-property @- row-axis-entry ).
metacells = example_metacells_daf()
# Incoming weights into the M412.08 metacell.
metacells["@ metacell :: edge_weight @- M412.08"]

# output

7-element Named Vector{Float32}
metacell  │
──────────┼────
M1671.28  │ 0.0
M2357.20  │ 0.0
M2169.56  │ 0.1
M2576.86  │ 0.0
M1440.15  │ 0.0
M756.63   │ 0.9
M412.08   │ 0.0

In all of these, the lookup operation ( : , :: ) can be followed by || default-value to specify a value to use if the property we look up doesn't exist (...vector... : vector-property || default-value , ...vector... :: square-matrix-property || default-value @| column-entry ).

If the base axis is the result of looking up some property, then some of the entries may have an empty string value. Looking up the vector property based on this will cause an error. To overcome this, you can request that these entries will be masked out of the result by prefixing the query with ?? (...vector... ?? : vector-property , ...vector... ?? :: matrix-property ... ), or specify the final value of these entries (...vector... ?? final-value : vector-property , ...vector... ?? final-value :: matrix-property ... ). Since it is possible to chain lookup operations (see VECTOR_OPERATION ), the final value is applied at the end of the lookup chain ( ?? final-value : vector-property-which-holds-axis-entries : vector-property-of-that-axis-which-holds-another-axis-entries : vector-property-of-the-other-axis ).

Syntax diagram:

cluster_all VECTOR LOOKUP Vector_axis VectorState axis    VL1 ?? final_value :: matrix_property || default_value @- row_name   IfEmpty(final_value) |> LookupMatrix(vector_property) |> IfMissing(default_value) |> RowIsEqual(row_name) Vector_axis->VL1:w VL2 ?? final_value :: matrix_property || default_value @| column_name   IfEmpty(final_value) |> LookupMatrix(vector_property) |> IfMissing(default_value) |> ColumnIsEqual(column_name) Vector_axis->VL2:w VL3 ?? final_value :: matrix_property || default_value @ column_axis = column_name   IfEmpty(final_value) |> LookupMatrix(vector_property) |> IfMissing(default_value) |> Axis(column_axis) |> IsEqual(column_name) Vector_axis->VL3:w VL4 ?? final_value : vector_property || default_value   IfEmpty(final_value) |> LookupVector(vector_property) |> IfMissing(default_value) Vector_axis->VL4:w Vector_values VectorState values    VL1_ lookup square matrix row by vector   VL1:e->VL1_:w VL1_:e->Vector_values VL2_ lookup square matrix column by vector   VL2:e->VL2_:w VL2_:e->Vector_values VL3_ lookup matrix column by vector   VL3:e->VL3_:w VL3_:e->Vector_values VL4_ lookup vector by vector   VL4:e->VL4_:w VL4_:e->Vector_values

DataAxesFormats.Queries.VECTOR_MASK Constant

A query fragment specifying a mask to apply to an axis. Valid phrases are:

  • Beginning a mask by looking up some vector property for each entry (...axis... [ vector-property , ...axis... [ ! vector-property ) - see VECTOR_MASK_LOOKUP .
  • Applying some operation to a vector we looked up (...mask... > value ) - see VECTOR_OPERATION .
  • Combining the mask with another one (...mask... & ...mask..., ...mask... & ! ...mask...) - see VECTOR_MASK_OPERATION .
  • Ending the mask (...mask... ] ).

Syntax diagram:

cluster_all VECTOR MASK Vector_base_axis VectorState base axis    V4 VECTOR MASK LOOKUP Vector_base_axis->V4:w Vector_final_axis VectorState final axis    V6 VECTOR MASK OPERATION Vector_mask_values VectorState base axis   BeginMask or BeginNegatedMask   VectorState mask values   V6->Vector_mask_values Vector_mask_values->V6 VG12 ]   EndMask() Vector_mask_values->VG12 V5 VECTOR OPERATION Vector_mask_values->V5 V4:e->Vector_mask_values VG12_ apply mask VG12:e->VG12_:w VG12_:e->Vector_final_axis V5->Vector_mask_values

DataAxesFormats.Queries.VECTOR_MASK_LOOKUP Constant

A query fragment specifying looking up a vector for a mask to apply to an axis. Valid phrases are similar to VECTOR_LOOKUP , except that they start with [ instead of : (starting with [ ! reverses the mask). Example:

cells = example_cells_daf()
cells["@ gene [ ! is_lateral ]"]

# output

245-element Named Vector{SubString{StringViews.StringView{Vector{UInt8}}}}
gene       │
───────────┼────────────
ENO1       │      "ENO1"
PRDM2      │     "PRDM2"
HP1BP3     │    "HP1BP3"
HNRNPR     │    "HNRNPR"
RSRP1      │     "RSRP1"
KHDRBS1    │   "KHDRBS1"
THRAP3     │    "THRAP3"
SMAP2      │     "SMAP2"
⋮                      ⋮
MYADM      │     "MYADM"
DDT        │       "DDT"
UQCR10     │    "UQCR10"
EIF3L      │     "EIF3L"
TNRC6B     │    "TNRC6B"
TNFRSF13C  │ "TNFRSF13C"
SOD1       │      "SOD1"
ATP5PO     │    "ATP5PO"

Syntax diagram:

cluster_all VECTOR MASK LOOKUP Vector_base_axis VectorState base axis    VO1 [ ! matrix_property || default_value @- row_name   Begin Negated Mask(matrix_property) |> IfMissing(default_value) |> RowIsEqual(row_name) Vector_base_axis->VO1:w VO2 [ ! matrix_property || default_value @| column_name   Begin Negated Mask(matrix_property) |> IfMissing(default_value) |> ColumnIsEqual(rowcolumn_name) Vector_base_axis->VO2:w VO3 [ ! matrix_property || default_value @ column_axis = column_name   Begin Negated Mask(matrix_property) |> IfMissing(default_value) |> IsEqual(column_name) Vector_base_axis->VO3:w VO4 [ ! vector_property || default_value   Begin Negated Mask(vector_property) |> IfMissing(default_value) Vector_base_axis->VO4:w Vector_mask_values VectorState base axis   BeginMask or  BeginNegatedMask   VectorState mask values   VO1_ lookup square matrix row mask   VO1:e->VO1_:w VO1_:e->Vector_mask_values VO2_ lookup square matrix column mask   VO2:e->VO2_:w VO2_:e->Vector_mask_values VO3_ lookup matrix column mask   VO3:e->VO3_:w VO3_:e->Vector_mask_values VO4_ lookup vector mask   VO4:e->VO4_:w VO4_:e->Vector_mask_values

DataAxesFormats.Queries.VECTOR_MASK_OPERATION Constant

A query fragment specifying combining a mask with a second mask. Valid phrases are similar to VECTOR_MASK_LOOKUP , except that they start with the logical combination operator ( & , | , ^ ), with an optional ! suffix for negating the second mask. Operations are evaluated in order (left to right). Example:

cells = example_cells_daf()
cells["@ donor [ age > 60 & sex = male ]"]

# output

29-element Named Vector{SubString{StringViews.StringView{Vector{UInt8}}}}
donor  │
───────┼───────
N16    │  "N16"
N17    │  "N17"
N59    │  "N59"
N86    │  "N86"
N88    │  "N88"
N91    │  "N91"
N92    │  "N92"
N95    │  "N95"
⋮             ⋮
N163   │ "N163"
N164   │ "N164"
N169   │ "N169"
N172   │ "N172"
N174   │ "N174"
N175   │ "N175"
N179   │ "N179"
N181   │ "N181"

Syntax diagram:

cluster_all VECTOR MASK OPERATION Vector_base_mask_values VectorState base axis   BeginMask or  BeginNegatedMask   VectorState base mask values   VMO1 & or  | or  ^ ! matrix_property   || default_value @- row_value   And or  Or or  Xor Not (matrix_property)   |> IfMissing(default_value) |> RowIsEqual(row_value) Vector_base_mask_values->VMO1:w VMO2 & or  | or  ^ ! matrix_property   || default_value @| column_value   And or  Or or  Xor Not (matrix_property)   |> IfMissing(default_value) |> ColumnIsEqual(column_value) Vector_base_mask_values->VMO2:w VMO3 & or  | or  ^ ! matrix_property   || default_value @ columns_axis = column_value   And or  Or or  Xor Not (matrix_property)   |> IfMissing(default_value) |> Axis(columns_axis) |> IsEqual(column_value) Vector_base_mask_values->VMO3:w VMO4 & or  | or  ^ ! vector_property   || default_value   And or  Or or  Xor Not (vector_property)   |> IfMissing(default_value) Vector_base_mask_values->VMO4:w Vector_final_mask_values VectorState base axis   BeginMask or  BeginNegatedMask   VectorState final mask values   Vector_mask_operands VectorState base axis   BeginMask or  BeginNegatedMask   VectorState base mask values   And or  Or or  Xor Not   VectorState other mask values   VMO5 VECTOR OPERATION Vector_mask_operands->VMO5 VMO6_ compute mask operation implicit        Vector_mask_operands->VMO6_:w VMO1_ lookup square matrix row other mask   VMO1:e->VMO1_:w VMO1_:e->Vector_mask_operands:sw VMO2_ lookup square matrix column other mask   VMO2:e->VMO2_:w VMO2_:e->Vector_mask_operands VMO3_ lookup matrix column other mask   VMO3:e->VMO3_:w VMO3_:e->Vector_mask_operands VMO4_ lookup vector other mask   VMO4:e->VMO4_:w VMO4_:e->Vector_mask_operands VMO5->Vector_mask_operands VMO6_:e->Vector_final_mask_values

DataAxesFormats.Queries.VECTOR_GROUP Constant

A query fragment for grouping vector values by some property and computing a single value per group. Valid phrases for fetching the group values are similar to VECTOR_LOOKUP but start with a / instead of : . This can be followed by any VECTOR_OPERATION (in particular, additional lookups). Once the final group value is established for each vector entry, the values of all entries with the same group value are reduced using a ReductionOperation to a single value. The result vector has this reduced value per group. E.g., @ cell : age / type >> Mean .

chain = example_chain_daf()
chain["@ cell : donor : age / metacell ?? : type >> Mean"]

# output

4-element Named Vector{Float32}
A        │
─────────┼────────
MEBEMP-E │ 63.9767
MEBEMP-L │ 63.9524
MPP      │  64.238
memory-B │ 62.3077

By default the result vector is sorted by the group value (this is also used as the name in the result NamedArray ). Specifying an VECTOR_AS_AXIS before the reduction operation changes this to require that the group values be entries in some axis. In this case the result vector will have one entry for each entry of the axis, in the axis order. If some axis entries do not have any vector values associated with them, then the reduction will fail (e.g. "mean of an empty vector"). In this case, you should specify a default value for the reduction. E.g., @ cell : age / type @ >> Mean || 0 . Example:

chain = example_chain_daf()
chain["@ cell [ metacell ?? : type != memory-B ] : donor : age / metacell : type =@ >> Mean || 0"]

# output

4-element Named Vector{Float32}
type     │
─────────┼────────
memory-B │     0.0
MEBEMP-E │ 63.9767
MEBEMP-L │ 63.9524
MPP      │  64.238

Syntax diagram:

cluster_all VECTOR GROUP Vector_base_values VectorState base values       VG1 / matrix_property || default_value @- row_value   GroupBy(matrix_property) |> IfMissing(default_value) |> RowIsEqual(row_value) Vector_base_values->VG1:w VG2 / matrix_property || default_value @| column_value   GroupBy(matrix_property) |> IfMissing(default_value) |> ColumnIsEqual(column_value) Vector_base_values->VG2:w VG3 / matrix_property || default_value @ columns_axis = column_value   GroupBy(matrix_property) |> IfMissing(default_value) |> Axis(columns_axis) |> IsEqual(column_value) Vector_base_values->VG3:w VG4 / vector_property || default_value   GroupBy(vector_property) |> IfMissing(default_value) Vector_base_values->VG4:w Vector_final_values VectorState final values       Vector_group_values VectorState base values       GroupBy   VectorState group values       VG5 VECTOR OPERATION Vector_group_values->VG5 VG6 >> Reduction param value || default_value   Reduction( param = value, ... ) |> IfMissing(default_value) Vector_group_values->VG6 VG7 VECTOR AS AXIS Vector_group_values->VG7:w VG1_ lookup square matrix row vector groups   VG1:e->VG1_:w VG1_:e->Vector_group_values:sw VG2_ lookup square matrix column vector groups   VG2:e->VG2_:w VG2_:e->Vector_group_values VG3_ lookup matrix column vector groups   VG3:e->VG3_:w VG3_:e->Vector_group_values VG4_ lookup vector vector groups   VG4:e->VG4_:w VG4_:e->Vector_group_values VG5->Vector_group_values VG6_ reduce grouped vector   VG6:e->VG6_:w VG6_:e->Vector_final_values VG7:e->VG6

DataAxesFormats.Queries.VECTOR_FROM_MATRIX Constant

A query fragment for reducing a matrix to a vector. Valid phrases are:

  • Reduce each row into a single value, resulting in an entry for each column of the matrix (...matrix... >| ReductionOperation ... ). Example:
metacells = example_metacells_daf()
metacells["@ metacell @ gene :: fraction >| Max"]

# output

7-element Named Vector{Float32}
metacell  │
──────────┼──────────
M1671.28  │  0.023321
M2357.20  │ 0.0233425
M2169.56  │ 0.0219235
M2576.86  │ 0.0236719
M1440.15  │ 0.0227677
M756.63   │ 0.0249121
M412.08   │ 0.0284936

  • Reduce each column into a single value, resulting in an entry for each row of the matrix (...matrix... >- ReductionOperation ... ).
metacells = example_metacells_daf()
metacells["@ metacell @ gene :: fraction >- Max"]

# output

683-element Named Vector{Float32}
gene         │
─────────────┼────────────
RPL22        │  0.00474096
PARK7        │ 0.000154199
ENO1         │ 0.000533887
PRDM2        │ 0.000151486
HP1BP3       │ 0.000248206
CDC42        │ 0.000207847
HNRNPR       │ 0.000129013
RPL11        │   0.0124251
⋮                        ⋮
NRIP1        │ 0.000361428
ATP5PF       │ 0.000170554
CCT8         │ 0.000142851
SOD1         │ 0.000177344
SON          │  0.00032361
ATP5PO       │  0.00018833
TTC3         │ 0.000144736
HMGN1        │ 0.000415481

Syntax diagram:

cluster_all VECTOR FROM MATRIX Matrix_values MatrixState values     VFM1 >- Reduction param value ... || default_value   ReduceToRow(Reduction( param, value, ... )) |> IfMissing(default_value) Matrix_values->VFM1:w VFM2 >| Reduction param value ... || default_value   ReduceToColumn(Reduction( param, value, ... )) |> IfMissing(default_value) Matrix_values->VFM2:w Vector_values VectorState values     VFM1_ reduce matrix to row   VFM1:e->VFM1_:w VFM1_:e->Vector_values VFM2_ reduce matrix to column   VFM2:e->VFM2_:w VFM2_:e->Vector_values

DataAxesFormats.Queries.MATRIX_QUERY Constant

A query returning a matrix result. Valid phrases are:

  • Lookup a matrix property after specifying its rows and columns axes (...rows axis... ...columns axis... :: matrix-property , ...rows axis... ...columns axis... :: matrix-property || default-value ). Example:
cells = example_cells_daf()
cells["@ cell @ gene :: UMIs"]

# output

856×683 Named Matrix{UInt8}
                        cell ╲ gene │        RPL22  …         HMGN1
────────────────────────────────────┼──────────────────────────────
demux_07_12_20_1_AACAAGATCCATTTCA-1 │         0x0c  …          0x02
demux_07_12_20_1_AACGAAAGTCCAATCA-1 │         0x08             0x01
demux_07_12_20_1_AAGACAAAGTTCCGTA-1 │         0x03             0x03
demux_07_12_20_1_AGACTCATCTATTGTC-1 │         0x08             0x01
demux_07_12_20_1_AGATAGACATTCCTCG-1 │         0x08             0x00
demux_07_12_20_1_ATCGTAGTCCAGTGCG-1 │         0x0e             0x02
demux_07_12_20_1_CACAGGCGTCCTACAA-1 │         0x0b             0x03
demux_07_12_20_1_CCTACGTAGCCAACCC-1 │         0x03             0x01
⋮                                                ⋮  ⋱             ⋮
demux_11_04_21_2_GGGTCACCACCACATA-1 │         0x05             0x03
demux_11_04_21_2_TACAACGGTTACACAC-1 │         0x01             0x00
demux_11_04_21_2_TAGAGTCAGAACGCGT-1 │         0x09             0x00
demux_11_04_21_2_TGATGCAAGGCCTGCT-1 │         0x07             0x00
demux_11_04_21_2_TGCCGAGAGTCGCGAA-1 │         0x01             0x00
demux_11_04_21_2_TGCTGAAAGCCGCACT-1 │         0x01             0x03
demux_11_04_21_2_TTCAGGACAGGAATAT-1 │         0x06             0x00
demux_11_04_21_2_TTTAGTCGTCTAGTGT-1 │         0x06  …          0x00

  • Given a vector of values, lookup another vector of the same size and generate a matrix of the number of times each combination of values appears (...vector... * vector-property ... ) - see MATRIX_COUNT . Example:

Matrices can then be modified by applying any MATRIX_OPERATION to it.

Syntax diagram:

cluster_all MATRIX Matrix_values MatrixState values    M3 MATRIX OPERATION Matrix_values->M3 Vector_data VectorState values or axis    M1 :: matrix_property || default_value   LookupMatrix(vector_property) |> IfMissing(default_value) M2 MATRIX COUNT Vector_data->M2:w Matrix_axes VectorState rows axis       VectorState columns axis       Matrix_axes->M1:w M1_ lookup matrix  M1:e->M1_:w M1_:e->Matrix_values M2:e->Matrix_values M3->Matrix_values

DataAxesFormats.Queries.MATRIX_COUNT Constant

A query fragment for computing a matrix of the number of times a combination of values appears in the same index in the first and second vectors. Valid phrases are similar to VECTOR_LOOKUP except they start with * instead of : . This can be followed by any VECTOR_OPERATION for computing the final second vector. E.g., @ cell : age * metacell : type . Example:

cells = example_cells_daf()
cells["@ cell : experiment * donor : sex"]

# output

23×2 Named Matrix{UInt16}
           A ╲ B │ female    male
─────────────────┼───────────────
demux_01_02_21_1 │ 0x0017  0x000e
demux_01_02_21_2 │ 0x000a  0x001a
demux_01_03_21_1 │ 0x0012  0x001b
demux_04_01_21_1 │ 0x0013  0x0016
demux_04_01_21_2 │ 0x0006  0x0012
demux_07_03_21_1 │ 0x000a  0x0016
demux_07_03_21_2 │ 0x000d  0x001b
demux_07_12_20_1 │ 0x0006  0x0011
⋮                       ⋮       ⋮
demux_21_02_21_1 │ 0x0012  0x0005
demux_21_02_21_2 │ 0x0009  0x002a
demux_21_12_20_1 │ 0x001e  0x0005
demux_21_12_20_2 │ 0x0000  0x0026
demux_22_02_21_1 │ 0x0012  0x0009
demux_22_02_21_2 │ 0x001c  0x0013
demux_28_12_20_1 │ 0x0018  0x0022
demux_28_12_20_2 │ 0x003f  0x0009

By default, the matrix rows and columns are sorted by the unique values. Explicitly specifying VECTOR_AS_AXIS for either the first or second vector will change the rows or columns to the axis entries in the right (axis) order. This may create rows or columns with all-zero values. E.g., @ cell : batch @ * metacell : type @ . Example:

cells = example_cells_daf()
cells["@ cell : experiment =@ * donor : sex"]

# output

23×2 Named Matrix{UInt16}
  experiment ╲ B │ female    male
─────────────────┼───────────────
demux_01_02_21_1 │ 0x0017  0x000e
demux_01_02_21_2 │ 0x000a  0x001a
demux_01_03_21_1 │ 0x0012  0x001b
demux_04_01_21_1 │ 0x0013  0x0016
demux_04_01_21_2 │ 0x0006  0x0012
demux_07_03_21_1 │ 0x000a  0x0016
demux_07_03_21_2 │ 0x000d  0x001b
demux_07_12_20_1 │ 0x0006  0x0011
⋮                       ⋮       ⋮
demux_21_02_21_1 │ 0x0012  0x0005
demux_21_02_21_2 │ 0x0009  0x002a
demux_21_12_20_1 │ 0x001e  0x0005
demux_21_12_20_2 │ 0x0000  0x0026
demux_22_02_21_1 │ 0x0012  0x0009
demux_22_02_21_2 │ 0x001c  0x0013
demux_28_12_20_1 │ 0x0018  0x0022
demux_28_12_20_2 │ 0x003f  0x0009

Syntax diagram:

cluster_all MATRIX COUNT Matrix_values MatrixState values    Vector_data VectorState values or  axis    MC1 * matrix_property || default_value @- row_value   CountBy(matrix_property) |> IfMissing(default_value) |> RowIsEqual(row_value) Vector_data->MC1:w MC2 * matrix_property || default_value @| column_value   CountBy(matrix_property) |> IfMissing(default_value) |> ColumnIsEqual(column_value) Vector_data->MC2:w MC3 * matrix_property || default_value @ columns_axis = column_value   CountBy(matrix_property) |> IfMissing(default_value) |> Axis(columns_axis) |> IsEqual(column_value) Vector_data->MC3:w MC4 * vector_property || default_value   CountBy(vector_property) |> IfMissing(default_value) Vector_data->MC4:w Vector_other_values VectorState values  or  axis      CountBy   VectorState other values       MC5 VECTOR OPERATION Vector_other_values->MC5 MC6_ compute count matrix implicit        Vector_other_values->MC6_:w MC7 VECTOR AS AXIS Vector_other_values->MC7:w MC1_ lookup square matrix row other values   MC1:e->MC1_:w MC1_:e->Vector_other_values:sw MC2_ lookup square matrix column other values   MC2:e->MC2_:w MC2_:e->Vector_other_values MC3_ lookup matrix column other values   MC3:e->MC3_:w MC3_:e->Vector_other_values MC4_ lookup vector other values   MC4:e->MC4_:w MC4_:e->Vector_other_values MC5->Vector_other_values MC6_:e->Matrix_values:w MC7:s->MC6_:n

DataAxesFormats.Queries.MATRIX_OPERATION Constant

A query fragment specifying some operation on a matrix of values. Valid phrases are:

  • Treating the matrix values as names of some axis entries and looking up some property of that axis (...matrix... @ axis-values-are-entries-of : vector-property-of-that-axis || default-value ) - see VECTOR_AS_AXIS and VECTOR_LOOKUP ) (while the matrix retains its shape, this shape does not effect the result so we treat it as a long vector for the purpose of the lookup).

  • Applying some operation to a vector we looked up (...matrix... % Eltwise ... ). Example:

metacells = example_metacells_daf()
metacells["@ metacell @ gene :: fraction % Log base 2 eps 1e-5"]

# output

7×683 Named Matrix{Float32}
metacell ╲ gene │        RPL22         PARK7  …          TTC3         HMGN1
────────────────┼──────────────────────────────────────────────────────────
M1671.28        │     -7.80014      -13.3582  …      -13.0011      -11.4571
M2357.20        │     -7.91664      -12.5723          -13.009      -11.7136
M2169.56        │     -7.71757      -13.0192         -13.0406      -11.1986
M2576.86        │      -7.8198      -12.8843         -12.6579      -11.5767
M1440.15        │     -7.77472      -12.9433         -13.3506      -11.5629
M756.63         │     -7.84368      -13.0487          -13.148      -11.8308
M412.08         │     -8.06051      -13.7017  …      -12.8821      -12.5166

  • Grouping the matrix rows or columns by something and reducing each group to a single one (...matrix... -/ vector-property >- Sum , ...matrix... |/ vector-property >| Sum ) - see MATRIX_GROUP .

Syntax diagram:

cluster_all MATRIX OPERATION Matrix_base_values MatrixState base values    VO1 % Eltwise param value ...   Eltwise( param = value, ... ) Matrix_base_values->VO1:w VO3_ MATRIX GROUP   Matrix_base_values->VO3_:w VO4 VECTOR AS AXIS Matrix_base_values->VO4:w VO5_ matrix property is an axis implicit        Matrix_base_values->VO5_:w Matrix_final_values MatrixState final values    VO1_ element-wise of matrix   VO1:e->VO1_:w VO1_:e->Matrix_final_values VO3_:e->Matrix_final_values Matrix_axis MatrixState axis    VO4:e->Matrix_axis VO5_:e->Matrix_axis VO6_ VECTOR LOOKUP   VO6_:e->Matrix_final_values Matrix_axis->VO6_:w

DataAxesFormats.Queries.MATRIX_GROUP Constant

A query fragment for grouping rows or columns by some property and computing a single one per group. Valid phrases for fetching the group values are similar to VECTOR_LOOKUP but start with a -/ or |/ instead of : . This can be followed by any VECTOR_OPERATION (in particular, additional lookups). Once the final group value is established for each row or column entry, the values of all entries with the same group value are reduced using a ReductionOperation to a single value. The result matrix has this reduced value per group. E.g., @ cell @ gene :: UMIs -/ metacell : type >- Sum . The reduction operation must match the group operation ( -/ ... >- , |/ ... >| ). Example:

metacells = example_metacells_daf()
metacells["@ metacell @ gene :: fraction -/ type >- Mean"]

# output

4×683 Named Matrix{Float32}
A ╲ gene │        RPL22         PARK7  …          TTC3         HMGN1
─────────┼──────────────────────────────────────────────────────────
MEBEMP-E │   0.00437961   0.000115139  …   0.000122451   0.000290955
MEBEMP-L │   0.00474096   0.000110458      0.000108683   0.000415481
MPP      │   0.00438723   0.000118797      0.000103007   0.000317991
memory-B │   0.00373581    6.50531f-5  …   0.000122469   0.000160654

By default groups are sorted by their unique values. Explicitly specifying VECTOR_AS_AXIS for the group will change the rows or columns to the axis entries in the right (axis) order. T This may create rows or columns with all-zero values.

By default the result is sorted by the group value (this is also used as the name in the result NamedArray ). Specifying an VECTOR_AS_AXIS before the reduction operation changes this to require that the group values be entries in some axis. In this case the result will have one entry for each entry of the axis, in the axis order.

metacells = example_metacells_daf()
metacells["@ metacell @ gene :: fraction -/ type =@ >- Mean"]

# output

4×683 Named Matrix{Float32}
type ╲ gene │        RPL22         PARK7  …          TTC3         HMGN1
────────────┼──────────────────────────────────────────────────────────
memory-B    │   0.00373581    6.50531f-5  …   0.000122469   0.000160654
MEBEMP-E    │   0.00437961   0.000115139      0.000122451   0.000290955
MEBEMP-L    │   0.00474096   0.000110458      0.000108683   0.000415481
MPP         │   0.00438723   0.000118797  …   0.000103007   0.000317991

If some axis entries do not have any values associated with them, then the reduction will fail (e.g. "mean of an empty row/column vector"). In this case, you should specify a default value for the reduction. E.g., @ cell @ gene :: UMIs -/ metacell : type =@ >- Sum || 0 . Example:

Syntax diagram:

cluster_all MATRIX GROUP Matrix_base_values MatrixState base values       MG2 |/ or  -/ matrix_property   || default_value @| column_value   GroupColumnsBy or  GroupRowsBy(matrix_property)   |> IfMissing(default_value) |> ColumnIsEqual(column_value) Matrix_base_values->MG2:w MG3 |/ or  -/ matrix_property   || default_value @ columns_axis = column_value   GroupColumnsBy(matrix_property) or  GroupRowsBy   |> IfMissing(default_value) |> Axis(columns_axis) |> IsEqual(column_value) Matrix_base_values->MG3:w Matrix_final_values MatrixState final values       Matrix_group_values VectorState group values       GroupColumnsBy or  GroupRowsBy     MatrixState base values       MG5 VECTOR OPERATION Matrix_group_values->MG5 MG6 >| or  >- Reduction param value || default_value   ReduceToColumn or  ReduceToRows (Reduction( param = value, ... )) |> IfMissing(default_value) Matrix_group_values->MG6:w MG7 VECTOR AS AXIS Matrix_group_values->MG7:w MG1 |/ or  -/ matrix_property   || default_value @- row_value   GroupColumnsBy or  GroupRowsBy(matrix_property)   |> IfMissing(default_value) |> RowIsEqual(row_value) MG1_ lookup square matrix row matrix columns or  rows groups     MG1:e->MG1_:w MG1_:e->Matrix_group_values:sw MG2_ lookup square matrix column matrix columns or  rows groups     MG2:e->MG2_:w MG2_:e->Matrix_group_values MG3_ lookup matrix column matrix columns or  rows groups     MG3:e->MG3_:w MG3_:e->Matrix_group_values MG4 |/ or  -/ vector_property   || default_value   GroupColumnsBy or  GroupRowsBy(vector_property)   |> IfMissing(default_value) MG4_ lookup vector matrix columns or  rows groups     MG4:e->MG4_:w MG4_:e->Matrix_group_values MG5->Matrix_group_values MG6_ reduce grouped matrix columns or  rows     MG6:e->MG6_:w MG6_:e->Matrix_final_values MG7:s->MG6:n

Functions

DataAxesFormats.Queries.get_query Function
get_query(
    daf::DafReader,
    query::QueryString;
    [cache::Bool = true]
)::Union{AbstractSet{<:AbstractString}, StorageScalar, NamedVector, NamedMatrix}

query |> get_query(
    daf::DafReader;
    cache::Bool = true,
)

Apply the full query to the Daf data and return the result. By default, this will cache the final query result, so repeated identical queries will be accelerated. This may consume a large amount of memory. You can disable it by specifying cache = false , or release the cached data using empty_cache! .

As a shorthand syntax you can also invoke this using getindex , that is, using the [] operator (e.g., daf["@ cell"] is equivalent to get_query(daf, "@ cell"; cache = false) ). Finally, you can use |> to invoke the query, which is especially useful when constructing it from the operations Axis("cell") |> get_query(daf) or even "@ cell" |> get_query(daf) .

Note

Using get_query , the query is cached (by default). Using [...] , the query is not cached. That is, [...] is mostly used for one-off queries (and in interactive sessions, etc.) while get_query is used for more "fundamental" queries that are expected to be re-used.

DataAxesFormats.Queries.get_frame Function
get_frame(
    daf::DafReader,
    axis::QueryString,
    [columns::Maybe{FrameColumns} = nothing;
    cache::Bool = true]
)::DataFrame end

Return a DataFrame containing multiple vectors of the same axis .

The axis can be either just the name of an axis (e.g., "cell" ), or a query for the axis (e.g., q"@ cell" ), possibly using a mask (e.g., q"@ cell [ age > 1 ]" ). The result of the query must be a vector of unique axis entry names.

If columns is not specified, the data frame will contain all the vector properties of the axis, in alphabetical order (since DataFrame has no concept of named rows, the 1st column will contain the name of the axis entry).

By default, this will not cache the results of the queries.

DataAxesFormats.Queries.FrameColumn Type

Specify a column for get_frame for some axis. The most generic form is a pair "column_name" => query . Two shorthands apply: the pair "column_name" => "=" is a shorthand for the pair "column_name" => ": column_name" , and so is the shorthand "column_name" (simple string).

We also allow specifying tuples instead of pairs to make it easy to invoke the API from other languages such as Python which do not have the concept of a Pair .

The query is combined with the axis query as follows (using full_vector_query ). The (full) query result should be a vector with one value for each entry of the axis query result.

DataAxesFormats.Queries.FrameColumns Type

Specify all the columns to collect for a frame. We would have liked to specify this as AbstractVector{<:FrameColumn} but Julia in its infinite wisdom considers ["a", "b" => "c"] to be a Vector{Any} , which would require literals to be annotated with the type.

DataAxesFormats.Queries.full_vector_query Function
full_vector_query(
    axis_query::Query,
    vector_query::QueryString,
    vector_name::Maybe{AbstractString} = nothing,
)::Query

Given a query for an axis, and some suffix query for a vector property, combine them into a full query for the vector values for the axis. This is used by FrameColumn for get_frame and also for queries of vector data in views.

Normally we just concatenate the axis query and the vector query. However, similar to defining a view, if the query starts with an axis operator, it may require repeating the axis query in it, so an axis operator with the special __axis__ name is replaced by the axis query.

DataAxesFormats.Queries.query_result_dimensions Function
query_result_dimensions(query::QueryString)::Int

Return the number of dimensions (-1 - names, 0 - scalar, 1 - vector, 2 - matrix) of the results of a query . This also verifies the query is syntactically valid, though it may still fail if applied to specific data due to invalid data values or types.

DataAxesFormats.Queries.query_requires_relayout Function
query_requires_relayout(daf::DafReader, query::QueryString)::Bool

Whether computing the query for the daf data requires relayout of some matrix. This also verifies the query is syntactically valid and that the query can be computed, though it may still fail if applied to specific data due to invalid values or types.

DataAxesFormats.Queries.is_axis_query Function
is_axis_query(query::QueryString)::Bool

Returns whether the query specifies a (possibly masked) axis. This also verifies the query is syntactically valid, though it may still fail if applied to specific data due to invalid data values or types.

DataAxesFormats.Queries.guess_typed_value Function
guess_typed_value(value::AbstractString)::StorageScalar

Given a string value, guess the typed value it represents:

  • true and false are assumed to be Bool .
  • Integers are assumed to be Int64 .
  • Floating point numbers are assumed to be Float64 , as are e and pi .
  • Anything else is assumed to be a string.

This doesn't have to be 100% accurate; it is intended to allow omitting the data type in most cases when specifying an IfMissing value. If it guesses wrong, just specify an explicit type (e.g., . version || 1.0 String ).

Query Operators

Names

Lookup

DataAxesFormats.Queries.AsAxis Type
struct AsAxis <: QueryOperation
    axis_name::Maybe{AbstractString}
end

A query operator for specifying that the values of a property we looked up are the names of entries in some axis. This is used extensively in VECTOR_AS_AXIS .

DataAxesFormats.Queries.SquareColumnIs Type
struct SquareColumnIs <: QueryOperation
    comparison_value::AbstractString
end

Whenever extracting a vector from a square matrix, specify the axis entry that identifies the column to extract. This is used in any phrase that looks up a vector out of a matrix (see VECTOR_QUERY and MATRIX_QUERY ).

Note

Julia and Daf use column-major layout as their default, so this is typically the natural way to extract a vector from a square matrix (e.g., for a square is_in_neighborhood matrix per block per block, the column is the base block and the rows are the other block, so the column vector contains a mask of all the blocks in the neighborhood of the base block).

DataAxesFormats.Queries.SquareRowIs Type
struct SquareRowIs <: QueryOperation
    comparison_value::AbstractString
end

Whenever extracting a vector from a square matrix, specify the axis entry that identifies the row to extract. This is used in any phrase that looks up a vector out of a matrix (see VECTOR_QUERY and MATRIX_QUERY ).

Note

Julia and Daf use column-major layout as their default, so this typically cuts across the natural way to extract a vector from a square matrix (e.g., for a square is_in_neighborhood matrix per block per block, the column is the base block and the rows are the other block, so the row vector contains a mask of all the base blocks that a given block is in the neighborhood of).

DataAxesFormats.Queries.IfNot Type
struct IfNot <: QueryOperation
    final_value::Maybe{StorageScalar}
end

Specify a final value to use when, having looked up some base property values, we use them as axis entry names to lookup another property of that axis. If the base property value is empty, then this is an error. Specifying IfNot without a final_value allows us to mask out that entry from the result instead. Specifying a final_value will use it for the final property value (since there may be an arbitrarily long chain of lookup operations).

Masks

Comparisons

DataAxesFormats.Queries.IsMatch Type
struct IsMatch <: QueryOperation
    comparison_value::StorageScalar
end

Convert a vector of values to a vector of Booleans, is true for (string!) entries that are a (complete!) match to the comparison_value regular expression (see VECTOR_OPERATION ).

DataAxesFormats.Queries.IsNotMatch Type
struct IsNotMatch <: QueryOperation
    comparison_value::StorageScalar
end

Convert a vector of values to a vector of Booleans, is true for (string!) entries that are not a (complete!) match to the comparison_value regular expression (see VECTOR_OPERATION ).

Groups

DataAxesFormats.Queries.CountBy Type
struct CountBy <: QueryOperation
    property_name::AbstractString
end

Specify a second property for each vector entry, to compute a matrix of counts of the entries with each combination of values (see MATRIX_COUNT .

Index