Matrix layouts

TanayLabUtilities.MatrixLayouts

— Module

Matrix data that has a clear layout, that is, a major_axis , regardless of whether it is dense or sparse.

That is, for Columns -major data, the values of each column are laid out consecutively in memory (each column is a single contiguous vector), so any operation that works on whole columns will be fast (e.g., summing the value of each column). In contrast, the values of each row are stored far apart from each other, so any operation that works on whole rows will be very slow in comparison (e.g., summing the value of each row).

For Rows -major data, the values of each row are laid out consecutively in memory (each row is a single contiguous vector). In contrast, the values of each column are stored far apart from each other. In this case, summing columns would be slow, and summing rows would be fast.

This is much simpler than the ArrayLayouts module which attempts to fully describe the layout of N-dimensional arrays, a much more ambitious goal which is an overkill for our needs.

Note

The "default" layout in Julia is column-major, which inherits this from matlab, which inherits this from FORTRAN, allegedly because this is more efficient for some linear algebra operations. In contrast, most system languages and Python numpy use row-major layout by default, because that's the sane layout (and generalizes better for tensors). At any rate, all systems work just fine with data of either memory layout; the key consideration is to keep track of the layout, and to apply operations "with the grain" rather than "against the grain" of the data.

TanayLabUtilities.MatrixLayouts.Rows

— Constant

A symbolic name for the rows axis. It is more readable to write, say, size(matrix, Rows), instead of size(matrix, 1).

TanayLabUtilities.MatrixLayouts.Columns

— Constant

A symbolic name for the rows axis. It is more readable to write, say, size(matrix, Columns), instead of size(matrix, 2).

TanayLabUtilities.MatrixLayouts.axis_name

— Function

axis_name(axis::Maybe{Integer})::String

Return the name of the axis (for messages).

println(axis_name(nothing))
println(axis_name(Rows))
println(axis_name(Columns))
println(axis_name(3))

# output

nothing
Rows
Columns
ERROR: invalid matrix axis: 3

TanayLabUtilities.MatrixLayouts.major_axis

— Function

major_axis(matrix::AbstractMatrix)::Maybe{Int8}

Return the index of the major axis of a matrix, that is, the axis one should keep fixed for an efficient inner loop accessing the matrix elements. If the matrix doesn't support any efficient access axis, returns nothing.

using Test

base = [0 1 2; 3 4 0]

@test major_axis(base) == Columns

# Slice

@test major_axis(@view base[:, [1, 3, 2]]) == Columns

# Named

using NamedArrays

@test major_axis(NamedArray(base)) == Columns

# Permuted

permuted = PermutedDimsArray(base, (2, 1))
@test major_axis(permuted) == Rows

@test flip(permuted) === base

unpermuted = PermutedDimsArray(base, (1, 2))
@test major_axis(unpermuted) == Columns

# LinearAlgebra

transposed = transpose(base)
@test major_axis(transposed) == Rows

@test flip(transposed) === base

adjointed = adjoint(base)
@test major_axis(adjointed) == Rows

# Sparse

using SparseArrays

sparse = SparseMatrixCSC(base)
@test major_axis(sparse) == Columns

println("OK")

# output

OK

TanayLabUtilities.MatrixLayouts.require_major_axis

— Function

require_major_axis(matrix::AbstractMatrix)::Int8

Similar to major_axis but will error if the matrix isn't in either row-major or column-major layout.

using Test

base = [0 1 2; 3 4 0]

@test require_major_axis(base) == Columns
@test require_major_axis(@view base[:, [1, 3, 2]]) == Columns

println("OK")

# output

OK

TanayLabUtilities.MatrixLayouts.minor_axis

— Function

minor_axis(matrix::AbstractMatrix)::Maybe{Int8}

Return the index of the minor axis of a matrix, that is, the axis one should vary for an efficient inner loop accessing the matrix elements. If the matrix doesn't support any efficient access axis, returns nothing.

using Test

base = [0 1 2; 3 4 0]

@test minor_axis(base) == Rows

# Slice

@test minor_axis(@view base[:, [1, 3, 2]]) == Rows

# Named

using NamedArrays

@test minor_axis(NamedArray(base)) == Rows

# Permuted

permuted = PermutedDimsArray(base, (2, 1))
@test minor_axis(permuted) == Columns

@test flip(permuted) === base

unpermuted = PermutedDimsArray(base, (1, 2))
@test minor_axis(unpermuted) == Rows

# LinearAlgebra

transposed = transpose(base)
@test minor_axis(transposed) == Columns

@test flip(transposed) === base

adjointed = adjoint(base)
@test minor_axis(adjointed) == Columns

@test flip(adjointed) === base

# Sparse

using SparseArrays

sparse = SparseMatrixCSC(base)
@test minor_axis(sparse) == Rows

println("OK")

# output

OK

TanayLabUtilities.MatrixLayouts.require_minor_axis

— Function

require_minor_axis(matrix::AbstractMatrix)::Int8

Similar to minor_axis but will error if the matrix isn't in either row-major or column-major layout.

using Test

base = [0 1 2; 3 4 0]

@test require_minor_axis(base) == Rows

@test require_minor_axis(@view base[:, [1, 3, 2]]) == Rows

println("OK")

# output

OK

TanayLabUtilities.MatrixLayouts.other_axis

— Function

other_axis(axis::Maybe{Integer})::Maybe{Int8}

Return the other matrix axis (that is, convert between Rows and Columns ). If given nothing returns nothing.

using Test

@test other_axis(nothing) === nothing
@test other_axis(Rows) == Columns
@test other_axis(Columns) == Rows

other_axis(3)

# output

ERROR: invalid matrix axis: 3

TanayLabUtilities.MatrixLayouts.flip

— Function

flip(AbstractMatrix)::AbstractMatrix

Flip the axes of a matrix. This applies PermutedDimsArray to the matrix. However, unlike the standard Julia functions, which in their infinite wisdom blindly add a wrapper on top of the matrix, this function looks at the input first. As long as it is not a matrix of Complex values, it will strip away an existing Transpose and/or Adjoint and/or PermutedDimsArray wrapper, instead of attaching an additional one. In addition, flip will also look inside a ReadOnlyArray and/or NamedArray to cancel out an internal flip wrapper.

Note

Something along these lines really should have been in Base. Since there isn't, everyone is expected to create their own version of this - this is ours. We try to use it "universally" in our code instead of PermutedDimsArray, transpose, adjoint, m', etc.

TanayLabUtilities.MatrixLayouts.flipped

— Function

flipped(matrix::AbstractMatrix)::AbstractMatrix

Return a transpose of a matrix, but instead of simply using a zero-copy wrapper, it actually rearranges the data. See relayout! .

using Test

# Dense

base = rand(3, 4)
@test flipped(base) == flip(base)
@test brief(base) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(flip(base)) == "4 x 3 x Float64 in Rows (Permute, Dense)"
@test brief(flipped(base)) == "4 x 3 x Float64 in Columns (Dense)"

# Named

using NamedArrays

base = NamedArray(rand(3, 4))
@test flipped(base) == flip(base)
@test brief(base) == "3 x 4 x Float64 in Columns (Named, Dense)"
@test brief(flip(base)) == "4 x 3 x Float64 in Rows (Named, Permute, Dense)"
@test brief(flipped(base)) == "4 x 3 x Float64 in Columns (Named, Dense)"

# Permuted

base = PermutedDimsArray(rand(3, 4), (2,1))
@test flipped(base) == flip(base)
@test brief(base) == "4 x 3 x Float64 in Rows (Permute, Dense)"
@test brief(flip(base)) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(flipped(base)) == "3 x 4 x Float64 in Rows (Permute, Dense)"

base = PermutedDimsArray(rand(3, 4), (1,2))
@test flipped(base) == flip(base)
@test brief(base) == "3 x 4 x Float64 in Columns (!Permute, Dense)"
@test brief(flip(base)) == "4 x 3 x Float64 in Rows (Permute, Dense)"
@test brief(flipped(base)) == "4 x 3 x Float64 in Columns (Dense)"

# LinearAlgebra

using LinearAlgebra

base = transpose(rand(3, 4))
@test flipped(base) == transpose(base)
@test brief(base) == "4 x 3 x Float64 in Rows (Transpose, Dense)"
@test brief(transpose(base)) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(flipped(base)) == "3 x 4 x Float64 in Rows (Permute, Dense)"

base = adjoint(rand(3, 4))
@test flipped(base) == transpose(base)
@test brief(base) == "4 x 3 x Float64 in Rows (Adjoint, Dense)"
@test brief(flip(base)) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(flipped(base)) == "3 x 4 x Float64 in Rows (Permute, Dense)"

# ReadOnly

base = read_only_array(rand(3, 4))
@test flipped(base) == transpose(base)
@test brief(base) == "3 x 4 x Float64 in Columns (ReadOnly, Dense)"
@test brief(flip(base)) == "4 x 3 x Float64 in Rows (ReadOnly, Permute, Dense)"
@test brief(flipped(base)) == "4 x 3 x Float64 in Columns (Dense)"

# Sparse

using SparseArrays

base = SparseMatrixCSC([0.0 1.0 2.0; 3.0 4.0 0.0])
@test flipped(base) == flip(base)
@test brief(base) == "2 x 3 x Float64 in Columns (Sparse 4 (67%) [Int64])"
@test brief(flip(base)) == "3 x 2 x Float64 in Rows (Permute, Sparse 4 (67%) [Int64])"
@test brief(flipped(base)) == "3 x 2 x Float64 in Columns (Sparse 4 (67%) [Int64])"

println("OK")

# output

OK

TanayLabUtilities.MatrixLayouts.relayout!

— Function

relayout!(destination::AbstractMatrix, source::AbstractMatrix)::AbstractMatrix
relayout!(destination::AbstractMatrix, source::NamedMatrix)::NamedMatrix

Return the same matrix data, but in the other memory layout.

Suppose you have a column-major UMIs matrix, whose rows are cells, and columns are genes. Therefore, looping on the UMIs of a gene will be fast, but looping on the UMIs of a cell will be slow. A flip (or transpose, no !, or PermutedDimsArray) of a matrix is fast; they create a zero-copy wrapper of the matrix with flipped axes, so its rows will be genes and columns will be cells, but in row-major layout. Therefore, still , looping on the UMIs of a gene is fast, and looping on the UMIs of a cell is slow.

In contrast, transpose! (with a !) (or flipped ) is slow; it creates a rearranged copy of the data, also returning a matrix whose rows are genes and columns are cells, but this time, in column-major layout. Therefore, in this case looping on the UMIs of a gene will be slow, and looping on the UMIs of a cell will be fast.

The relayout! is essentially a zero-copy flip of the slow transpose!. You end up with a matrix that appears to be the same as the original (rows are cells and columns are genes), but behaves differently - looping on the UMIs of a gene will be slow, and looping on the UMIs of a cell will be fast. In addition, relayout! will work for both sparse and dense matrices. If the source is a NamedMatrix, then the result will be a NamedMatrix with the same axes (zero-copy shared from the source). If destination is already a NamedMatrix, then its axes must match source.

The caller is responsible for providing a sensible destination matrix (sparse for a sparse source, dense for a non-sparse source, with compatible storage sizes).

Note

It is almost always worthwhile to relayout! a matrix and then looping "with the grain" of the data, instead of skipping it and looping "against the grain" of the data. This is because (in Julia at least) the implementation of transpose! is optimized for the task, while the other operations typically don't provide any specific optimizations for working "against the grain" of the data. The benefits of a relayout! become more significant the more operations are done on the data in the loop.

using Test

using LinearAlgebra

source = rand(3, 4)
destination = flip(rand(4, 3))

result = relayout!(destination, source)
@test result === destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Permute, Dense)"
@test result == source

# Named

using NamedArrays

named_source = NamedArray(rand(3, 4))
destination = flip(rand(4, 3))
result = relayout!(destination, named_source)
@test parent(result) === destination
@test brief(named_source) == "3 x 4 x Float64 in Columns (Named, Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Named, Permute, Dense)"
@test result == named_source

source = rand(3, 4)
named_destination = NamedArray(flip(rand(4, 3)))
result = relayout!(named_destination, source)
@test result === named_destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Named, Permute, Dense)"
@test result == source

source = rand(3, 4)
named_destination = Transpose(NamedArray(rand(4, 3)))
result = relayout!(named_destination, source)
@test result === named_destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Transpose, Named, Dense)"
@test result == source

named_source = NamedArray(rand(3, 4))
named_destination = NamedArray(flip(rand(4, 3)))
result = relayout!(named_destination, named_source)
@test result === named_destination
@test brief(named_source) == "3 x 4 x Float64 in Columns (Named, Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Named, Permute, Dense)"
@test result == named_source

# Permuted

source = rand(3, 4)
destination = PermutedDimsArray(rand(4, 3), (2,1))
result = relayout!(destination, source)
@test result === destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (Permute, Dense)"
@test result == source

source = rand(3, 4)
destination = PermutedDimsArray(adjoint(rand(4, 3)), (1,2))
result = relayout!(destination, source)
@test result === destination
@test brief(source) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(result) == "3 x 4 x Float64 in Rows (!Permute, Adjoint, Dense)"
@test result == source

# Sparse

using SparseArrays

source = SparseMatrixCSC([0.0 1.0 2.0; 3.0 4.0 0.0])
destination = flip(SparseMatrixCSC([30.0 0.0; 0.0 40.0; 20.0 10.0]))
result = relayout!(destination, source)
@test result === destination
@test brief(source) == "2 x 3 x Float64 in Columns (Sparse 4 (67%) [Int64])"
@test brief(result) == "2 x 3 x Float64 in Rows (Permute, Sparse 4 (67%) [Int64])"
@test result == source

println("OK")

# output

OK

TanayLabUtilities.MatrixLayouts.relayout

— Function

relayout(matrix::AbstractMatrix)::AbstractMatrix
relayout(matrix::NamedMatrix)::NamedMatrix

Same as relayout! but allocates the destination matrix for you. Is equivalent to flip(flipped(matrix)).

base = rand(3, 4)
@assert relayout(base) == base
@assert major_axis(relayout(base)) == minor_axis(base)

# output

TanayLabUtilities.MatrixLayouts.@assert_vector

— Macro

@assert_vector(vector::Any, [n_elements::Integer])

Assert that the vector is an AbstractVector and optionally that it has n_elements, with a friendly error message if it fails.

vector = [0, 1, 2]

@assert_vector(vector)
n_elements = 3
@assert_vector(vector, n_elements)
m_elements = 2
@assert_vector(vector, m_elements)

# output

ERROR: wrong size: 3
of the vector: vector
is different from m_elements: 2

scalar = 1
@assert_vector(scalar)

# output

ERROR: non-vector scalar: 1

TanayLabUtilities.MatrixLayouts.@assert_matrix

— Macro

@assert_matrix(matrix::Any, [n_rows::Integer, n_columns::Integer], [major_axis::Int8])

Assert that the matrix is an AbstractMatrix and optionally that it has n_rows and n_columns. If the major_axis is given, and does not match the matrix, invokes the GLOBAL_INEFFICIENT_ACTION_HANDLER .

matrix = [0 1 2; 3 4 0]

@assert_matrix(matrix)
n_rows, n_columns = (2, 3)
@assert_matrix(matrix, Columns)
@assert_matrix(matrix, n_rows, n_columns)
@assert_matrix(matrix, n_rows, n_columns, Columns)

m_rows, m_columns = (3, 2)
@assert_matrix(matrix, m_rows, m_columns)

# output

ERROR: wrong size: (2, 3)
of the matrix: matrix
is different from (m_rows, m_columns): (3, 2)

matrix = [0 1 2; 3 4 0]

TanayLabUtilities.MatrixLayouts.GLOBAL_INEFFICIENT_ACTION_HANDLER = IgnoreHandler

@assert_matrix(matrix, Rows)

TanayLabUtilities.MatrixLayouts.GLOBAL_INEFFICIENT_ACTION_HANDLER = ErrorHandler

@assert_matrix(matrix, Rows)

# output

ERROR: inefficient major axis: Columns
for matrix: 2 x 3 x Int64 in Columns (Dense)

TanayLabUtilities.MatrixLayouts.GLOBAL_INEFFICIENT_ACTION_HANDLER

— Constant

The AbnormalHandler to use when accessing a matrix in an inefficient way ("against the grain"). Returns the previous handler. The default handler is WarnHandler.

Matrix layouts

Index