Matrix layouts
TanayLabUtilities.MatrixLayouts
—
Module
Matrix data that has a clear layout, that is, a
major_axis
, regardless of whether it is dense or sparse.
That is, for
Columns
-major data, the values of each column are laid out consecutively in memory (each column is a single contiguous vector), so any operation that works on whole columns will be fast (e.g., summing the value of each column). In contrast, the values of each row are stored far apart from each other, so any operation that works on whole rows will be very slow in comparison (e.g., summing the value of each row).
For
Rows
-major data, the values of each row are laid out consecutively in memory (each row is a single contiguous vector). In contrast, the values of each column are stored far apart from each other. In this case, summing columns would be slow, and summing rows would be fast.
This is much simpler than the ArrayLayouts module which attempts to fully describe the layout of N-dimensional arrays, a much more ambitious goal which is an overkill for our needs.
The "default" layout in Julia is column-major, which inherits this from matlab, which inherits this from FORTRAN, allegedly because this is more efficient for some linear algebra operations. In contrast, most system languages and Python
numpy
use row-major layout by default, because that's the sane layout (and generalizes better for tensors). At any rate, all systems work just fine with data of either memory layout; the key consideration is to keep track of the layout, and to apply operations "with the grain" rather than "against the grain" of the data.
TanayLabUtilities.MatrixLayouts.Rows
—
Constant
A symbolic name for the rows axis. It is more readable to write, say,
size(matrix, Rows)
, instead of
size(matrix, 1)
.
TanayLabUtilities.MatrixLayouts.Columns
—
Constant
A symbolic name for the rows axis. It is more readable to write, say,
size(matrix, Columns)
, instead of
size(matrix, 2)
.
TanayLabUtilities.MatrixLayouts.axis_name
—
Function
axis_name(axis::Maybe{Integer})::String
Return the name of the axis (for messages).
println(axis_name(nothing))
println(axis_name(Rows))
println(axis_name(Columns))
println(axis_name(3))
# output
nothing
Rows
Columns
ERROR: invalid matrix axis: 3
TanayLabUtilities.MatrixLayouts.major_axis
—
Function
major_axis(matrix::AbstractMatrix)::Maybe{Int8}
Return the index of the major axis of a matrix, that is, the axis one should keep
fixed
for an efficient inner loop accessing the matrix elements. If the matrix doesn't support any efficient access axis, returns
nothing
.
using Test
base = [0 1 2; 3 4 0]
@test major_axis(base) == Columns
# Slice
@test major_axis(@view base[:, [1, 3, 2]]) == Columns
# Named
using NamedArrays
@test major_axis(NamedArray(base)) == Columns
# Permuted
permuted = PermutedDimsArray(base, (2, 1))
@test major_axis(permuted) == Rows
@test flip(permuted) === base
unpermuted = PermutedDimsArray(base, (1, 2))
@test major_axis(unpermuted) == Columns
# LinearAlgebra
transposed = transpose(base)
@test major_axis(transposed) == Rows
@test flip(transposed) === base
adjointed = adjoint(base)
@test major_axis(adjointed) == Rows
# Sparse
using SparseArrays
sparse = SparseMatrixCSC(base)
@test major_axis(sparse) == Columns
println("OK")
# output
OK
TanayLabUtilities.MatrixLayouts.require_major_axis
—
Function
require_major_axis(matrix::AbstractMatrix)::Int8
Similar to
major_axis
but will
error
if the matrix isn't in either row-major or column-major layout.
using Test
base = [0 1 2; 3 4 0]
@test require_major_axis(base) == Columns
@test require_major_axis(@view base[:, [1, 3, 2]]) == Columns
println("OK")
# output
OK
TanayLabUtilities.MatrixLayouts.minor_axis
—
Function
minor_axis(matrix::AbstractMatrix)::Maybe{Int8}
Return the index of the minor axis of a matrix, that is, the axis one should
vary
for an efficient inner loop accessing the matrix elements. If the matrix doesn't support any efficient access axis, returns
nothing
.
using Test
base = [0 1 2; 3 4 0]
@test minor_axis(base) == Rows
# Slice
@test minor_axis(@view base[:, [1, 3, 2]]) == Rows
# Named
using NamedArrays
@test minor_axis(NamedArray(base)) == Rows
# Permuted
permuted = PermutedDimsArray(base, (2, 1))
@test minor_axis(permuted) == Columns
@test flip(permuted) === base
unpermuted = PermutedDimsArray(base, (1, 2))
@test minor_axis(unpermuted) == Rows
# LinearAlgebra
transposed = transpose(base)
@test minor_axis(transposed) == Columns
@test flip(transposed) === base
adjointed = adjoint(base)
@test minor_axis(adjointed) == Columns
@test flip(adjointed) === base
# Sparse
using SparseArrays
sparse = SparseMatrixCSC(base)
@test minor_axis(sparse) == Rows
println("OK")
# output
OK
TanayLabUtilities.MatrixLayouts.require_minor_axis
—
Function
require_minor_axis(matrix::AbstractMatrix)::Int8
Similar to
minor_axis
but will
error
if the matrix isn't in either row-major or column-major layout.
using Test
base = [0 1 2; 3 4 0]
@test require_minor_axis(base) == Rows
@test require_minor_axis(@view base[:, [1, 3, 2]]) == Rows
println("OK")
# output
OK
TanayLabUtilities.MatrixLayouts.other_axis
—
Function
other_axis(axis::Maybe{Integer})::Maybe{Int8}
Return the other
matrix
axis
(that is, convert between
Rows
and
Columns
). If given
nothing
returns
nothing
.
using Test
@test other_axis(nothing) === nothing
@test other_axis(Rows) == Columns
@test other_axis(Columns) == Rows
other_axis(3)
# output
ERROR: invalid matrix axis: 3
TanayLabUtilities.MatrixLayouts.flip
—
Function
flip(AbstractMatrix)::AbstractMatrix
Flip the axes of a matrix. This applies
PermutedDimsArray
to the matrix. However, unlike the standard Julia functions, which in their infinite wisdom blindly add a wrapper on top of the matrix, this function looks at the input first. As long as it is not a matrix of
Complex
values, it will strip away an existing
Transpose
and/or
Adjoint
and/or
PermutedDimsArray
wrapper, instead of attaching an additional one. In addition,
flip
will also look inside a
ReadOnlyArray
and/or
NamedArray
to cancel out an internal flip wrapper.
TanayLabUtilities.MatrixLayouts.flipped
—
Function
flipped(matrix::AbstractMatrix)::AbstractMatrix
Return a transpose of a matrix, but instead of simply using a zero-copy wrapper, it actually rearranges the data. See
relayout!
.
using Test
# Dense
base = rand(3, 4)
@test flipped(base) == flip(base)
@test brief(base) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(flip(base)) == "4 x 3 x Float64 in Rows (Permute, Dense)"
@test brief(flipped(base)) == "4 x 3 x Float64 in Columns (Dense)"
# Named
using NamedArrays
base = NamedArray(rand(3, 4))
@test flipped(base) == flip(base)
@test brief(base) == "3 x 4 x Float64 in Columns (Named, Dense)"
@test brief(flip(base)) == "4 x 3 x Float64 in Rows (Named, Permute, Dense)"
@test brief(flipped(base)) == "4 x 3 x Float64 in Columns (Named, Dense)"
# Permuted
base = PermutedDimsArray(rand(3, 4), (2,1))
@test flipped(base) == flip(base)
@test brief(base) == "4 x 3 x Float64 in Rows (Permute, Dense)"
@test brief(flip(base)) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(flipped(base)) == "3 x 4 x Float64 in Rows (Permute, Dense)"
base = PermutedDimsArray(rand(3, 4), (1,2))
@test flipped(base) == flip(base)
@test brief(base) == "3 x 4 x Float64 in Columns (!Permute, Dense)"
@test brief(flip(base)) == "4 x 3 x Float64 in Rows (Permute, Dense)"
@test brief(flipped(base)) == "4 x 3 x Float64 in Columns (Dense)"
# LinearAlgebra
using LinearAlgebra
base = transpose(rand(3, 4))
@test flipped(base) == transpose(base)
@test brief(base) == "4 x 3 x Float64 in Rows (Transpose, Dense)"
@test brief(transpose(base)) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(flipped(base)) == "3 x 4 x Float64 in Rows (Permute, Dense)"
base = adjoint(rand(3, 4))
@test flipped(base) == transpose(base)
@test brief(base) == "4 x 3 x Float64 in Rows (Adjoint, Dense)"
@test brief(flip(base)) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(flipped(base)) == "3 x 4 x Float64 in Rows (Permute, Dense)"
# ReadOnly
base = read_only_array(rand(3, 4))
@test flipped(base) == transpose(base)
@test brief(base) == "3 x 4 x Float64 in Columns (ReadOnly, Dense)"
@test brief(flip(base)) == "4 x 3 x Float64 in Rows (ReadOnly, Permute, Dense)"
@test brief(flipped(base)) == "4 x 3 x Float64 in Columns (Dense)"
# Sparse
using SparseArrays
base = SparseMatrixCSC([0.0 1.0 2.0; 3.0 4.0 0.0])
@test flipped(base) == flip(base)
@test brief(base) == "2 x 3 x Float64 in Columns (Sparse 4 (67%) [Int64])"
@test brief(flip(base)) == "3 x 2 x Float64 in Rows (Permute, Sparse 4 (67%) [Int64])"
@test brief(flipped(base)) == "3 x 2 x Float64 in Columns (Sparse 4 (67%) [Int64])"
println("OK")
# output
OK
TanayLabUtilities.MatrixLayouts.relayout!
—
Function
relayout!(destination::AbstractMatrix, source::AbstractMatrix)::AbstractMatrix
relayout!(destination::AbstractMatrix, source::NamedMatrix)::NamedMatrix
Return the same
matrix
data, but in the other memory layout.
Suppose you have a column-major UMIs matrix, whose rows are cells, and columns are genes. Therefore, looping on the UMIs of a gene will be fast, but looping on the UMIs of a cell will be slow. A
flip
(or
transpose
, no
!
, or
PermutedDimsArray
) of a matrix is fast; they create a zero-copy wrapper of the matrix with flipped axes, so its rows will be genes and columns will be cells, but in row-major layout. Therefore,
still
, looping on the UMIs of a gene is fast, and looping on the UMIs of a cell is slow.
In contrast,
transpose!
(with a
!
) (or
flipped
) is slow; it creates a rearranged copy of the data, also returning a matrix whose rows are genes and columns are cells, but this time, in column-major layout. Therefore, in this case looping on the UMIs of a gene will be slow, and looping on the UMIs of a cell will be fast.
The
relayout!
is essentially a zero-copy
flip
of the slow
transpose!
. You end up with a matrix that
appears
to be the same as the original (rows are cells and columns are genes), but behaves
differently
- looping on the UMIs of a gene will be slow, and looping on the UMIs of a cell will be fast. In addition,
relayout!
will work for both sparse and dense matrices. If the
source
is a
NamedMatrix
, then the result will be a
NamedMatrix
with the same axes (zero-copy shared from the
source
). If
destination
is already a
NamedMatrix
, then its axes must match
source
.
The caller is responsible for providing a sensible
destination
matrix (sparse for a sparse
source
, dense for a non-sparse
source
, with compatible storage sizes).
It is almost always worthwhile to
relayout!
a matrix and then looping "with the grain" of the data, instead of skipping it and looping "against the grain" of the data. This is because (in Julia at least) the implementation of
transpose!
is optimized for the task, while the other operations typically don't provide any specific optimizations for working "against the grain" of the data. The benefits of a
relayout!
become more significant the more operations are done on the data in the loop.
using Test
using LinearAlgebra
source = rand(3, 4)
destination = flip(rand(4, 3))
result = relayout!(destination, source)
@test result === destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Permute, Dense)"
@test result == source
# Named
using NamedArrays
named_source = NamedArray(rand(3, 4))
destination = flip(rand(4, 3))
result = relayout!(destination, named_source)
@test parent(result) === destination
@test brief(named_source) == "3 x 4 x Float64 in Columns (Named, Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Named, Permute, Dense)"
@test result == named_source
source = rand(3, 4)
named_destination = NamedArray(flip(rand(4, 3)))
result = relayout!(named_destination, source)
@test result === named_destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Named, Permute, Dense)"
@test result == source
source = rand(3, 4)
named_destination = Transpose(NamedArray(rand(4, 3)))
result = relayout!(named_destination, source)
@test result === named_destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Transpose, Named, Dense)"
@test result == source
named_source = NamedArray(rand(3, 4))
named_destination = NamedArray(flip(rand(4, 3)))
result = relayout!(named_destination, named_source)
@test result === named_destination
@test brief(named_source) == "3 x 4 x Float64 in Columns (Named, Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Named, Permute, Dense)"
@test result == named_source
# Permuted
source = rand(3, 4)
destination = PermutedDimsArray(rand(4, 3), (2,1))
result = relayout!(destination, source)
@test result === destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Permute, Dense)"
@test result == source
source = rand(3, 4)
destination = PermutedDimsArray(adjoint(rand(4, 3)), (1,2))
result = relayout!(destination, source)
@test result === destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (!Permute, Adjoint, Dense)"
@test result == source
# Sparse
using SparseArrays
source = SparseMatrixCSC([0.0 1.0 2.0; 3.0 4.0 0.0])
destination = flip(SparseMatrixCSC([30.0 0.0; 0.0 40.0; 20.0 10.0]))
result = relayout!(destination, source)
@test result === destination
@test brief(source) == "2 x 3 x Float64 in Columns (Sparse 4 (67%) [Int64])"
@test brief(result) == "2 x 3 x Float64 in Rows (Permute, Sparse 4 (67%) [Int64])"
@test result == source
println("OK")
# output
OK
TanayLabUtilities.MatrixLayouts.relayout
—
Function
relayout(matrix::AbstractMatrix)::AbstractMatrix
relayout(matrix::NamedMatrix)::NamedMatrix
Same as
relayout!
but allocates the destination matrix for you. Is equivalent to
flip(flipped(matrix))
.
base = rand(3, 4)
@assert relayout(base) == base
@assert major_axis(relayout(base)) == minor_axis(base)
# output
TanayLabUtilities.MatrixLayouts.@assert_vector
—
Macro
@assert_vector(vector::Any, [n_elements::Integer])
Assert that the
vector
is an
AbstractVector
and optionally that it has
n_elements
, with a friendly error message if it fails.
vector = [0, 1, 2]
@assert_vector(vector)
n_elements = 3
@assert_vector(vector, n_elements)
m_elements = 2
@assert_vector(vector, m_elements)
# output
ERROR: wrong size: 3
of the vector: vector
is different from m_elements: 2
scalar = 1
@assert_vector(scalar)
# output
ERROR: non-vector scalar: 1
TanayLabUtilities.MatrixLayouts.@assert_matrix
—
Macro
@assert_matrix(matrix::Any, [n_rows::Integer, n_columns::Integer], [major_axis::Int8])
Assert that the
matrix
is an
AbstractMatrix
and optionally that it has
n_rows
and
n_columns
. If the
major_axis
is given, and does not match the matrix, invokes the
GLOBAL_INEFFICIENT_ACTION_HANDLER
.
matrix = [0 1 2; 3 4 0]
@assert_matrix(matrix)
n_rows, n_columns = (2, 3)
@assert_matrix(matrix, Columns)
@assert_matrix(matrix, n_rows, n_columns)
@assert_matrix(matrix, n_rows, n_columns, Columns)
m_rows, m_columns = (3, 2)
@assert_matrix(matrix, m_rows, m_columns)
# output
ERROR: wrong size: (2, 3)
of the matrix: matrix
is different from (m_rows, m_columns): (3, 2)
matrix = [0 1 2; 3 4 0]
TanayLabUtilities.MatrixLayouts.GLOBAL_INEFFICIENT_ACTION_HANDLER = IgnoreHandler
@assert_matrix(matrix, Rows)
TanayLabUtilities.MatrixLayouts.GLOBAL_INEFFICIENT_ACTION_HANDLER = ErrorHandler
@assert_matrix(matrix, Rows)
# output
ERROR: inefficient major axis: Columns
for matrix: 2 x 3 x Int64 in Columns (Dense)
TanayLabUtilities.MatrixLayouts.GLOBAL_INEFFICIENT_ACTION_HANDLER
—
Constant
The
AbnormalHandler
to use when accessing a matrix in an inefficient way ("against the grain"). Returns the previous handler. The default handler is
WarnHandler
.
Index
-
TanayLabUtilities.MatrixLayouts -
TanayLabUtilities.MatrixLayouts.Columns -
TanayLabUtilities.MatrixLayouts.GLOBAL_INEFFICIENT_ACTION_HANDLER -
TanayLabUtilities.MatrixLayouts.Rows -
TanayLabUtilities.MatrixLayouts.axis_name -
TanayLabUtilities.MatrixLayouts.flip -
TanayLabUtilities.MatrixLayouts.flipped -
TanayLabUtilities.MatrixLayouts.major_axis -
TanayLabUtilities.MatrixLayouts.minor_axis -
TanayLabUtilities.MatrixLayouts.other_axis -
TanayLabUtilities.MatrixLayouts.relayout -
TanayLabUtilities.MatrixLayouts.relayout! -
TanayLabUtilities.MatrixLayouts.require_major_axis -
TanayLabUtilities.MatrixLayouts.require_minor_axis -
TanayLabUtilities.MatrixLayouts.@assert_matrix -
TanayLabUtilities.MatrixLayouts.@assert_vector