Matrix Formats

TanayLabUtilities.MatrixFormats

— Module

Deal with (some) of the matrix formats. This obviously can't be compherensive but it should cover the matrix types we have encountered so far and hopefully falls back to reasonable defaults for more exotic matrix types.

In Julia, many array types are wrappers around "parent" arrays. The specific wrappers we deal with in most cases are NamedArray which adds names to the rows and/or columns, PermutedDimsArray which flips the order of the axes, Transpose and Adjoint which likewise flip the axes ( Adjoint also transforms complex values), and ReadOnlyArray which prevents mutating the array. And then there are more transformative wrappers such as SubArray, SparseVector and SparseMatrixCSC, PyArray, etc.

This makes life difficult. Specifically, you can't rely (much) on the type system to separate code dealing with different array types. For example, not all issparse arrays derive from AbstractSparseArray (because you might have a sparse array wrapped in something). It would have been great if there were isdense and isstrided functions to match and libraries actually used them to trigger optimized code but "that would have been too easy".

The code here tries to put this under some control so we can write robust code which "does the right thing", in most cases, at least when it comes to converting between formats. This means we are forced to provide alternatives to some built-in functions (for example, copying arrays). Sigh.

TanayLabUtilities.MatrixFormats.copy_array

— Function

copy_array(array::AbstractArray; eltype::Maybe{Type} = nothing, indtype::Maybe{Type} = nothing)::AbstractArray

Create a copy of an array. This differs from Base.copy in the following:

Copying a read-only array returns a mutable array. In contrast, both Base.copy and Base.deepcopy of a ReadOnlyArray array will return a ReadOnlyArray array, which is technically correct, but is rather pointless.
Copying a NamedArray returns a NamedArray that shares the names (but not the data storage).
Copying will preserve the layout of the data; for example, copying a Transpose array is still a Transpose array. In contrast, while Base.deepcopy will preserve the layout, Base.copy will silently relayout the matrix, which is both expensive and unexpected.
Copying a sparse vector or matrix gives a sparse result. Copying anything else gives a simple dense array regardless of the original type. This is done because a deepcopy of PyArray will still share the underlying buffer, which removes the whole point of doing a copy. Sigh.
Copying a vector of anything derived from AbstractString returns a vector of AbstractString.
You can override the eltype of the array (and/or the indtype, if it is sparse).

using Test

base = [0 1 2; 3 4 0]

# Dense

@test brief(base) == "2 x 3 x Int64 in Columns (Dense)"
@test brief(copy_array(base)) == "2 x 3 x Int64 in Columns (Dense)"
@test copy_array(base) == base
@test copy_array(base) !== base

@test copy_array(base; eltype = Int32) == base
@test brief(copy_array(base; eltype = Int32)) == "2 x 3 x Int32 in Columns (Dense)"

# Sparse

using SparseArrays

sparse = SparseMatrixCSC(base)
@test copy_array(sparse) == sparse
@test copy_array(sparse) !== sparse
@test brief(sparse) == "2 x 3 x Int64 in Columns (Sparse 4 (67%) [Int64])"
@test brief(copy_array(sparse)) == "2 x 3 x Int64 in Columns (Sparse 4 (67%) [Int64])"

@test copy_array(sparse; eltype = Int32) == sparse
@test brief(copy_array(sparse; eltype = Int32)) == "2 x 3 x Int32 in Columns (Sparse 4 (67%) [Int64])"

@test copy_array(sparse; indtype = Int8) == sparse
@test brief(copy_array(sparse; indtype = Int8)) == "2 x 3 x Int64 in Columns (Sparse 4 (67%) [Int8])"

# ReadOnly

read_only = read_only_array(base)
@test brief(read_only) == "2 x 3 x Int64 in Columns (ReadOnly, Dense)"
@test brief(copy_array(read_only)) == "2 x 3 x Int64 in Columns (Dense)"
@test copy_array(read_only) == read_only
@test copy_array(read_only) !== base

# Named

using NamedArrays

named = NamedArray(base)
@test brief(named) == "2 x 3 x Int64 in Columns (Named, Dense)"
@test brief(copy_array(named)) == "2 x 3 x Int64 in Columns (Named, Dense)"
@test copy_array(named) == named
@test parent(copy_array(named)) !== base

# Permuted

permuted = PermutedDimsArray(base, (2, 1))
@test brief(permuted) == "3 x 2 x Int64 in Rows (Permute, Dense)"
@test brief(copy_array(permuted)) == "3 x 2 x Int64 in Rows (Permute, Dense)"
@test copy_array(permuted) == permuted
@test parent(copy_array(permuted)) !== base

unpermuted = PermutedDimsArray(base, (1, 2))
@test brief(unpermuted) == "2 x 3 x Int64 in Columns (!Permute, Dense)"
@test brief(copy_array(unpermuted)) == "2 x 3 x Int64 in Columns (!Permute, Dense)"
@test copy_array(unpermuted) == unpermuted
@test parent(copy_array(unpermuted)) !== base

# LinearAlgebra

using LinearAlgebra

transposed = transpose(base)
@test brief(transposed) == "3 x 2 x Int64 in Rows (Transpose, Dense)"
@test brief(copy_array(transposed)) == "3 x 2 x Int64 in Rows (Transpose, Dense)"
@test copy_array(transposed) == transposed
@test parent(copy_array(transposed)) !== base

adjointed = adjoint(base)
@test brief(adjointed) == "3 x 2 x Int64 in Rows (Adjoint, Dense)"
@test brief(copy_array(adjointed)) == "3 x 2 x Int64 in Rows (Adjoint, Dense)"
@test copy_array(adjointed) == adjointed
@test parent(copy_array(adjointed)) !== base

println("OK")

# output

OK

using Test

# Dense

base = [0, 1, 2]

@test brief(base) == "3 x Int64 (Dense)"
@test brief(copy_array(base)) == "3 x Int64 (Dense)"
@test copy_array(base) == base
@test copy_array(base) !== base

# Sparse

using SparseArrays

sparse = SparseVector(base)
@test brief(sparse) == "3 x Int64 (Sparse 2 (67%) [Int64])"
@test brief(copy_array(sparse)) == "3 x Int64 (Sparse 2 (67%) [Int64])"
@test copy_array(sparse) == sparse
@test copy_array(sparse) !== sparse

# ReadOnly

read_only = read_only_array(base)
@test brief(read_only) == "3 x Int64 (ReadOnly, Dense)"
@test brief(copy_array(read_only)) == "3 x Int64 (Dense)"
@test copy_array(read_only) == read_only
@test copy_array(read_only) !== base

# Named

using NamedArrays

named = NamedArray(base)
@test brief(named) == "3 x Int64 (Named, Dense)"
@test brief(copy_array(named)) == "3 x Int64 (Named, Dense)"
@test copy_array(named) == named
@test parent(copy_array(named)) !== base

# LinearAlgebra

using LinearAlgebra

transposed = transpose(base)
@test brief(transposed) == "3 x Int64 (Transpose, Dense)"
@test brief(copy_array(transposed)) == "3 x Int64 (Transpose, Dense)"
@test copy_array(transposed) == transposed
@test parent(copy_array(transposed)) !== base

adjointed = adjoint(base)
@test brief(adjointed) == "3 x Int64 (Adjoint, Dense)"
@test brief(copy_array(adjointed)) == "3 x Int64 (Adjoint, Dense)"
@test copy_array(adjointed) == adjointed
@test parent(copy_array(adjointed)) !== base

# String

base = split("abc", "")

@test brief(base) == "3 x Str (Dense)"
@test brief(copy_array(base)) == "3 x Str (Dense)"
@test eltype(base) != AbstractString
@test eltype(copy_array(base)) == AbstractString
@test copy_array(base) == base

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.similar_array

— Function

similar_array(
    array::AbstractArray;
    [value::Any = undef,
    eltype::Maybe{Type} = nothing,
    default_major_axis::Maybe{Integer} = Columns]
)::AbstractArray
end

Return an array (vector or a matrix) similar to the given one. By default the data has the same eltype as the original, and is uninitialized unless you specify a value. The returned data is always dense ( Vector or Matrix).

This is different from similar in that it will preserve the layout of a matrix (for example, similar_array of a transpose will also be a transpose). Also, similar_array of a NamedArray will be another NamedArray sharing the axes with the original, and ReadOnlyArray wrappers are stripped from the result. If the array is a matrix with no clear major_axis , such as a @views slice of a matrix, then the result will have the default_major_axis.

using Test

base = rand(3, 4)

@test brief(base) == "3 x 4 x Float64 in Columns (Dense)"
@test similar_array(base) !== base
@test brief(similar_array(base)) == "3 x 4 x Float64 in Columns (Dense)"

@test brief(similar_array(base; eltype = Int32)) == "3 x 4 x Int32 in Columns (Dense)"
@test brief(similar_array(base; value = 0.0)) == "3 x 4 x Float64 in Columns (Dense)"
@test all(similar_array(base; value = 0.0) .== 0)

# ReadOnly

read_only = read_only_array(base)
@test brief(read_only) == "3 x 4 x Float64 in Columns (ReadOnly, Dense)"
@test brief(similar_array(read_only)) == "3 x 4 x Float64 in Columns (Dense)"

# Named

using NamedArrays

named = NamedArray(base)
@test brief(named) == "3 x 4 x Float64 in Columns (Named, Dense)"
@test similar_array(named) !== named
@test brief(similar_array(named)) == "3 x 4 x Float64 in Columns (Named, Dense)"

# Permuted

permuted = PermutedDimsArray(base, (2, 1))
@test brief(permuted) == "4 x 3 x Float64 in Rows (Permute, Dense)"
@test similar_array(permuted) !== permuted
@test brief(similar_array(permuted)) == "4 x 3 x Float64 in Rows (Permute, Dense)"

# LinearAlgebra

transposed = transpose(base)
@test brief(transposed) == "4 x 3 x Float64 in Rows (Transpose, Dense)"
@test similar_array(transposed) !== transposed
@test brief(similar_array(transposed)) == "4 x 3 x Float64 in Rows (Transpose, Dense)"

adjointed = adjoint(base)
@test brief(adjointed) == "4 x 3 x Float64 in Rows (Adjoint, Dense)"
@test similar_array(adjointed) !== adjointed
@test brief(similar_array(adjointed)) == "4 x 3 x Float64 in Rows (Adjoint, Dense)"

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.sparse_matrix_csc

— Function

sparse_matrix_csc(
    matrix::AbstractMatrix;
    eltype::Maybe{Type} = nothing,
    indtype::Maybe{Type} = nothing
)::SparseMatrixCSC

sparse_matrix_csc(
    colptr::AbstractVector,
    rowval::AbstractVector,
    nzval::AbstractVector
)::Union{ReadOnlyArray, SparseMatrixCSC}

Create a sparse column-major matrix. This differs from the simple SparseMatrixCSC in the following ways:

The integer index type is UInt32 if possible. Only very large matrix sizes use UInt64. This greatly reduces the size of large matrices.
If constructing the matrix from three vectors, then if any of them are ReadOnlyArray, this will return a ReadOnlyArray wrapper for the result (which will internally refer to the mutable arrays).
If eltype is specified, this will be the element type of the result.

using Test

# Matrix

@test brief(sparse_matrix_csc([0 1 2; 3 4 0])) == "2 x 3 x Int64 in Columns (Sparse 4 (67%) [UInt32])"
@test brief(sparse_matrix_csc([0 1 2; 3 4 0]; eltype = Float32)) == "2 x 3 x Float32 in Columns (Sparse 4 (67%) [UInt32])"
@test brief(sparse_matrix_csc([0 1 2; 3 4 0]; indtype = UInt8)) == "2 x 3 x Int64 in Columns (Sparse 4 (67%) [UInt8])"

# Vectors

sparse = sparse_matrix_csc([0 1 2; 3 4 0])

@test brief(sparse_matrix_csc(2, 3, sparse.colptr, sparse.rowval, sparse.nzval)) == "2 x 3 x Int64 in Columns (Sparse 4 (67%) [UInt32])"
@test brief(sparse_matrix_csc(2, 3, read_only_array(sparse.colptr), read_only_array(sparse.rowval), read_only_array(sparse.nzval))) ==
      "2 x 3 x Int64 in Columns (ReadOnly, Sparse 4 (67%) [UInt32])"

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.sparse_vector

— Function

sparse_vector(
    vector::AbstractMatrix;
    eltype::Maybe{Type} = nothing,
    indtype::Maybe{Type} = nothing,
)::SparseVector

sparse_vector(
    size::Integer,
    inzind::AbstractVector,
    nzval::AbstractVector
)::Union{ReadOnlyArray, SparseVector}

Create a sparse vector. This differs from the simple SparseVector in the following ways:

The integer index type is UInt32 if possible. Only very large matrix sizes use UInt64. This greatly reduces the size of large matrices.
If constructing the vector from two vectors, then if any of them are ReadOnlyArray, this will return a ReadOnlyArray wrapper for the result (which will internally refer to the mutable arrays).
If eltype is specified, this will be the element type of the result.

using Test

# Vector

@test brief(sparse_vector([0, 1, 2])) == "3 x Int64 (Sparse 2 (67%) [UInt32])"
@test brief(sparse_vector([0, 1, 2]; eltype = Float32)) == "3 x Float32 (Sparse 2 (67%) [UInt32])"

# Vectors

@test brief(sparse_vector(3, [1, 3], [1.0, 2.0])) == "3 x Float64 (Sparse 2 (67%) [Int64])"
@test brief(sparse_vector(3, read_only_array([1, 3]), read_only_array([1.0, 2.0]))) == "3 x Float64 (ReadOnly, Sparse 2 (67%) [Int64])"

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.sparse_mask_vector

— Function

sparse_mask_vector(
    size::Integer,
    inzind::AbstractVector
)::Union{ReadOnlyArray, SparseVector{Bool}}

Create a sparse mask vector using only the indices of the true entries. Alas, this still needs to allocate a vector of Bool for the data.

using Test

@test brief(sparse_mask_vector(3, [1, 3])) == "3 x Bool (Sparse 2 (67%) [Int64])"
@test brief(sparse_mask_vector(3, read_only_array([1, 3]))) == "3 x Bool (ReadOnly, Sparse 2 (67%) [Int64])"

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.dense_mask_vector

— Function

dense_mask_vector(
    size::Integer,
    inzind::AbstractVector
)::Vector{Bool}

Create a dense mask vector using only the indices of the true entries.

println(brief(dense_mask_vector(4, [1, 3])))

# output

4 x Bool (Dense; 2 (50%) true)

TanayLabUtilities.MatrixFormats.sparsify

— Function

sparsify(
    matrix::AbstractMatrix;
    copy::Bool = false,
    eltype::Maybe{Type} = nothing,
    indtype::Maybe{Type} = nothing
)::AbstractMatrix

sparsify(
    vector::AbstractVector;
    copy::Bool = false,
    eltype::Maybe{Type} = nothing,
    indtype::Maybe{Type} = nothing
)::AbstractVector

Return a sparse version of an array, possibly forcing a different eltype and/or indtype. If given a dense matrix, the default indtype will be indtype_for_size for the matrix. This will preserve the matrix layout (for example, sparsify of a transposed matrix will be a transposed matrix). If copy, this will create a copy even if it is already sparse and has the correct eltype and indtype.

using Test
using SparseArrays

# Dense

dense = rand(3, 4)
@test sparsify(dense) == dense
@test brief(dense) == "3 x 4 x Float64 in Columns (Dense)"
@test brief(sparsify(dense)) == "3 x 4 x Float64 in Columns (Sparse 12 (100%) [UInt32])"

# Sparse

sparse = SparseMatrixCSC([0 1 2; 3 4 0])
@test sparsify(sparse) === sparse
@test brief(sparse) == "2 x 3 x Int64 in Columns (Sparse 4 (67%) [Int64])"

@test sparsify(sparse; copy = true) == sparse
@test sparsify(sparse; copy = true) !== sparse
@test brief(sparsify(sparse)) == "2 x 3 x Int64 in Columns (Sparse 4 (67%) [Int64])"

@test sparsify(sparse; eltype = Int8) == sparse
@test brief(sparsify(sparse; eltype = Int8)) == "2 x 3 x Int8 in Columns (Sparse 4 (67%) [Int64])"

@test sparsify(sparse; indtype = Int8) == sparse
@test brief(sparsify(sparse; indtype = Int8)) == "2 x 3 x Int64 in Columns (Sparse 4 (67%) [Int8])"

# ReadOnly

read_only = read_only_array(sparse)
@test sparsify(read_only) === read_only
@test brief(read_only) == "2 x 3 x Int64 in Columns (ReadOnly, Sparse 4 (67%) [Int64])"

read_only = read_only_array(dense)
@test sparsify(read_only) == read_only
@test brief(sparsify(read_only)) == "3 x 4 x Float64 in Columns (ReadOnly, Sparse 12 (100%) [UInt32])"

# Named

using NamedArrays

named = NamedArray(sparse)
@test sparsify(named) === named
@test brief(named) == "2 x 3 x Int64 in Columns (Named, Sparse 4 (67%) [Int64])"

named = NamedArray(dense)
@test sparsify(named) == named
@test brief(sparsify(named)) == "3 x 4 x Float64 in Columns (Named, Sparse 12 (100%) [UInt32])"

# Permuted

permuted = PermutedDimsArray(sparse, (2, 1))
@test sparsify(permuted) === permuted
@test brief(permuted) == "3 x 2 x Int64 in Rows (Permute, Sparse 4 (67%) [Int64])"

unpermuted = PermutedDimsArray(sparse, (1, 2))
@test sparsify(unpermuted) === unpermuted
@test brief(unpermuted) == "2 x 3 x Int64 in Columns (!Permute, Sparse 4 (67%) [Int64])"

permuted = PermutedDimsArray(dense, (2, 1))
@test sparsify(permuted) == permuted
@test brief(permuted) == "4 x 3 x Float64 in Rows (Permute, Dense)"
@test brief(sparsify(permuted)) == "4 x 3 x Float64 in Rows (Permute, Sparse 12 (100%) [UInt32])"

unpermuted = PermutedDimsArray(dense, (1, 2))
@test sparsify(unpermuted) == unpermuted
@test brief(unpermuted) == "3 x 4 x Float64 in Columns (!Permute, Dense)"
@test brief(sparsify(unpermuted)) == "3 x 4 x Float64 in Columns (!Permute, Sparse 12 (100%) [UInt32])"

# LinearAlgebra

transposed = transpose(sparse)
@test sparsify(transposed) === transposed
@test brief(transposed) == "3 x 2 x Int64 in Rows (Transpose, Sparse 4 (67%) [Int64])"

adjointed = adjoint(sparse)
@test sparsify(adjointed) === adjointed
@test brief(adjointed) == "3 x 2 x Int64 in Rows (Adjoint, Sparse 4 (67%) [Int64])"

transposed = transpose(dense)
@test sparsify(transposed) == transposed
@test brief(transposed) == "4 x 3 x Float64 in Rows (Transpose, Dense)"
@test brief(sparsify(transposed)) == "4 x 3 x Float64 in Rows (Transpose, Sparse 12 (100%) [UInt32])"

adjointed = adjoint(dense)
@test sparsify(adjointed) == adjointed
@test brief(adjointed) == "4 x 3 x Float64 in Rows (Adjoint, Dense)"
@test brief(sparsify(adjointed)) == "4 x 3 x Float64 in Rows (Adjoint, Sparse 12 (100%) [UInt32])"

println("OK")

# output

OK

using Test
using SparseArrays

# Dense

dense = rand(4)
@test sparsify(dense) == dense
@test brief(dense) == "4 x Float64 (Dense)"
@test brief(sparsify(dense)) == "4 x Float64 (Sparse 4 (100%) [UInt32])"

# Sparse

sparse = SparseVector([0, 1, 2, 0])
@test sparsify(sparse) === sparse
@test brief(sparse) == "4 x Int64 (Sparse 2 (50%) [Int64])"

@test sparsify(sparse; copy = true) == sparse
@test sparsify(sparse; copy = true) !== sparse
@test brief(sparsify(sparse)) == "4 x Int64 (Sparse 2 (50%) [Int64])"

@test sparsify(sparse; eltype = Int8) == sparse
@test brief(sparsify(sparse; eltype = Int8)) == "4 x Int8 (Sparse 2 (50%) [Int64])"

@test sparsify(sparse; indtype = Int8) == sparse
@test brief(sparsify(sparse; indtype = Int8)) == "4 x Int64 (Sparse 2 (50%) [Int8])"

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.densify

— Function

densify(matrix::AbstractMatrix; copy::Bool = false, eltype::Maybe{Type} = nothing)::AbstractMatrix
densify(vector::AbstractVector; copy::Bool = false, eltype::Maybe{Type} = nothing)::AbstractVector

Return a dense version of an array, possibly forcing a different eltype. This will preserve the matrix layout (for example, densify of a transposed matrix will be a transposed matrix). If copy, this will create a copy even if it is already dense and has the correct eltype.

using Test
using SparseArrays

# Dense

dense = rand(3, 4)
@test densify(dense) === dense
@test brief(dense) == "3 x 4 x Float64 in Columns (Dense)"

@test densify(dense; copy = true) !== dense
@test densify(dense; copy = true) == dense
@test brief(densify(dense; copy = true)) == "3 x 4 x Float64 in Columns (Dense)"

@test isapprox(densify(dense; eltype = Float32), dense)
@test brief(densify(dense; eltype = Float32)) == "3 x 4 x Float32 in Columns (Dense)"

# Sparse

sparse = SparseMatrixCSC([0 1 2; 3 4 0])

@test densify(sparse) == sparse
@test brief(densify(sparse)) == "2 x 3 x Int64 in Columns (Dense)"
@test brief(densify(sparse; eltype = Int8)) == "2 x 3 x Int8 in Columns (Dense)"

# ReadOnly

read_only = read_only_array(sparse)
@test densify(read_only) == read_only
@test brief(read_only) == "2 x 3 x Int64 in Columns (ReadOnly, Sparse 4 (67%) [Int64])"
@test brief(densify(read_only)) == "2 x 3 x Int64 in Columns (ReadOnly, Dense)"

read_only = read_only_array(dense)
@test densify(read_only) == dense

# Named

using NamedArrays

named = NamedArray(sparse)
@test densify(named) == named
@test brief(named) == "2 x 3 x Int64 in Columns (Named, Sparse 4 (67%) [Int64])"
@test brief(densify(named)) == "2 x 3 x Int64 in Columns (Named, Dense)"

named = NamedArray(dense)
@test densify(named) == dense

# Permuted

permuted = PermutedDimsArray(dense, (2, 1))
@test densify(permuted) === permuted
@test brief(permuted) == "4 x 3 x Float64 in Rows (Permute, Dense)"

unpermuted = PermutedDimsArray(dense, (1, 2))
@test densify(unpermuted) === unpermuted
@test brief(unpermuted) == "3 x 4 x Float64 in Columns (!Permute, Dense)"

permuted = PermutedDimsArray(sparse, (2, 1))
@test densify(permuted) == permuted
@test brief(permuted) == "3 x 2 x Int64 in Rows (Permute, Sparse 4 (67%) [Int64])"
@test brief(densify(permuted)) == "3 x 2 x Int64 in Rows (Permute, Dense)"

unpermuted = PermutedDimsArray(sparse, (1, 2))
@test densify(unpermuted) == unpermuted
@test brief(unpermuted) == "2 x 3 x Int64 in Columns (!Permute, Sparse 4 (67%) [Int64])"
@test brief(densify(unpermuted)) == "2 x 3 x Int64 in Columns (!Permute, Dense)"

# LinearAlgebra

transposed = transpose(dense)
@test densify(transposed) === transposed
@test brief(transposed) == "4 x 3 x Float64 in Rows (Transpose, Dense)"

adjointed = adjoint(dense)
@test densify(adjointed) === adjointed
@test brief(adjointed) == "4 x 3 x Float64 in Rows (Adjoint, Dense)"

transposed = transpose(sparse)
@test densify(transposed) == transposed
@test brief(transposed) == "3 x 2 x Int64 in Rows (Transpose, Sparse 4 (67%) [Int64])"
@test brief(densify(transposed)) == "3 x 2 x Int64 in Rows (Transpose, Dense)"

adjointed = adjoint(sparse)
@test densify(adjointed) == adjointed
@test brief(adjointed) == "3 x 2 x Int64 in Rows (Adjoint, Sparse 4 (67%) [Int64])"
@test brief(densify(adjointed)) == "3 x 2 x Int64 in Rows (Adjoint, Dense)"

println("OK")

# output

OK

using Test
using SparseArrays

# Sparse

sparse = SparseVector([0, 1, 2, 0])

@test densify(sparse) == sparse
@test brief(densify(sparse)) == "4 x Int64 (Dense)"

# Dense

dense = rand(4)
@test densify(dense) === dense
@test brief(dense) == "4 x Float64 (Dense)"

@test densify(dense; copy = true) !== dense
@test densify(dense; copy = true) == dense
@test brief(densify(dense; copy = true)) == "4 x Float64 (Dense)"

@test isapprox(densify(dense; eltype = Float32), dense)
@test brief(densify(dense; eltype = Float32)) == "4 x Float32 (Dense)"

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.bestify

— Function

bestify(
    matrix::AbstractMatrix;
    min_sparse_saving_fraction::AbstractFloat = ```0.25```,
    copy::Bool = false,
    eltype::Maybe{Type} = nothing,
)::AbstractMatrix

bestify(
    matrix::AbstractVector;
    min_sparse_saving_fraction::AbstractFloat = ```0.25```,
    copy::Bool = false,
    eltype::Maybe{Type} = nothing,
)::AbstractVector

Return a "best" (dense or sparse) version of an array. The sparse format is chosen if it saves at least min_sparse_saving_fraction of the storage of the dense format. If copy, this will create a copy even if it is already in the best format.

If eltype is specified, computes the savings (and create the "best" version) using this element type. In addition, if given a sparse matrix, we consider the indtype_for_size for it, and if that saves min_sparse_saving_fraction relative to the current sparse representation, we'll create a new one using the better (smaller) indtype.

using Test
using LinearAlgebra

# Dense

dense = zeros(Int32, 5, 5)
view(dense, diagind(dense)) .= 1

@test bestify(dense) == dense
@test brief(bestify(dense)) == "5 x 5 x Int32 in Columns (Sparse 5 (20%) [UInt32])"

@test bestify(dense; min_sparse_saving_fraction = 0.5) === dense

# Sparse

sparse = sparse_matrix_csc(dense)
@test bestify(sparse) === sparse
@test brief(sparse) == "5 x 5 x Int32 in Columns (Sparse 5 (20%) [UInt32])"

# ReadOnly

read_only = read_only_array(dense)
@test bestify(read_only; min_sparse_saving_fraction = 0.5) === read_only
@test brief(read_only) == "5 x 5 x Int32 in Columns (ReadOnly, Dense)"

@test bestify(read_only) == read_only
@test brief(bestify(read_only)) == "5 x 5 x Int32 in Columns (ReadOnly, Sparse 5 (20%) [UInt32])"

read_only = read_only_array(sparse)
@test bestify(read_only) === read_only
@test brief(read_only) == "5 x 5 x Int32 in Columns (ReadOnly, Sparse 5 (20%) [UInt32])"

@test bestify(read_only; min_sparse_saving_fraction = 0.5) == read_only
@test brief(bestify(read_only; min_sparse_saving_fraction = 0.5)) == "5 x 5 x Int32 in Columns (ReadOnly, Dense)"

# Named

using NamedArrays

named = NamedArray(dense)
@test bestify(named; min_sparse_saving_fraction = 0.5) === named
@test brief(named) == "5 x 5 x Int32 in Columns (Named, Dense)"

@test bestify(named) == named
@test brief(bestify(named)) == "5 x 5 x Int32 in Columns (Named, Sparse 5 (20%) [UInt32])"

named = NamedArray(sparse)
@test bestify(named) === named
@test brief(named) == "5 x 5 x Int32 in Columns (Named, Sparse 5 (20%) [UInt32])"

@test bestify(named; min_sparse_saving_fraction = 0.5) == named
@test brief(bestify(named; min_sparse_saving_fraction = 0.5)) == "5 x 5 x Int32 in Columns (Named, Dense)"

# Permuted

permuted = PermutedDimsArray(dense, (2, 1))
@test bestify(permuted; min_sparse_saving_fraction = 0.5) === permuted
@test brief(permuted) == "5 x 5 x Int32 in Rows (Permute, Dense)"

@test bestify(permuted) == permuted
@test brief(bestify(permuted)) == "5 x 5 x Int32 in Rows (Permute, Sparse 5 (20%) [UInt32])"

permuted = PermutedDimsArray(sparse, (1, 2))
@test bestify(permuted) === permuted
@test brief(permuted) == "5 x 5 x Int32 in Columns (!Permute, Sparse 5 (20%) [UInt32])"

@test bestify(permuted; min_sparse_saving_fraction = 0.5) == permuted
@test brief(bestify(permuted; min_sparse_saving_fraction = 0.5)) == "5 x 5 x Int32 in Columns (!Permute, Dense)"

# LinearAlgebra

transposed = transpose(dense)
@test bestify(transposed; min_sparse_saving_fraction = 0.5) === transposed
@test brief(transposed) == "5 x 5 x Int32 in Rows (Transpose, Dense)"

@test bestify(transposed) == transposed
@test brief(bestify(transposed)) == "5 x 5 x Int32 in Rows (Transpose, Sparse 5 (20%) [UInt32])"

adjointed = adjoint(sparse)
@test bestify(adjointed) === adjointed
@test brief(adjointed) == "5 x 5 x Int32 in Rows (Adjoint, Sparse 5 (20%) [UInt32])"

@test bestify(adjointed; min_sparse_saving_fraction = 0.5) == adjointed
@test brief(bestify(adjointed; min_sparse_saving_fraction = 0.5)) == "5 x 5 x Int32 in Rows (Adjoint, Dense)"

println("OK")

# output

OK

using Test
using LinearAlgebra

# Dense

dense = zeros(Int32, 3)
dense[1] = 1

@test bestify(dense) == dense
@test brief(bestify(dense)) == "3 x Int32 (Sparse 1 (33%) [UInt32])"

@test bestify(dense; min_sparse_saving_fraction = 0.5) === dense

# Sparse

sparse = sparse_vector(dense)
@test bestify(sparse) === sparse
@test brief(sparse) == "3 x Int32 (Sparse 1 (33%) [UInt32])"

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.indtype_for_size

— Function

indtype_for_size(size::Integer)::Type

Return the integer data type which is large enough to hold indices and offsets for a SparseMatrixCSC matrix of some size (total number of elements). We try to use UInt32 whenever possible because for large matrices (especially with 32-bit value types) this will drastically reduce the amount of space used.

println(10000000 => indtype_for_size(10000000))
println(10000000000 => indtype_for_size(10000000000))

# output

10000000 => UInt32
10000000000 => UInt64

TanayLabUtilities.MatrixFormats.colptr

— Function

colptr(sparse::AbstractMatrix)::AbstractVector{<:Integer}

Return the colptr of a sparse matrix.

using Test
using NamedArrays
using SparseArrays

sparse_matrix = SparseMatrixCSC([0 1 2; 3 4 0])
@assert colptr(sparse_matrix) === sparse_matrix.colptr
@assert colptr(read_only_array(sparse_matrix)) === sparse_matrix.colptr
@assert colptr(NamedArray(sparse_matrix)) === sparse_matrix.colptr

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.rowval

— Function

rowval(sparse::AbstractArray)::AbstractVector{<Integer}

Return the rowval of a sparse array.

using Test
using NamedArrays
using SparseArrays

sparse_matrix = SparseMatrixCSC([0 1 2; 3 4 0])
@assert rowval(sparse_matrix) === sparse_matrix.rowval
@assert rowval(read_only_array(sparse_matrix)) === sparse_matrix.rowval
@assert rowval(NamedArray(sparse_matrix)) === sparse_matrix.rowval

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.nzind

— Function

nzind(sparse::AbstractVector)::AbstractVector{<:Integer}

Return the nzind of a sparse vector.

using Test
using NamedArrays
using SparseArrays

sparse_vector = SparseVector([0, 1, 2])
@assert nzind(sparse_vector) === sparse_vector.nzind
@assert nzind(read_only_array(sparse_vector)) === sparse_vector.nzind
@assert nzind(NamedArray(sparse_vector)) === sparse_vector.nzind

println("OK")

# output

OK

TanayLabUtilities.MatrixFormats.nzval

— Function

nzval(sparse::AbstractArray)::AbstractVector

Return the nzval of a sparse array.

using Test
using NamedArrays
using SparseArrays

sparse_matrix = SparseMatrixCSC([0 1 2; 3 4 0])
@assert nzval(sparse_matrix) === sparse_matrix.nzval
@assert nzval(read_only_array(sparse_matrix)) === sparse_matrix.nzval
@assert nzval(NamedArray(sparse_matrix)) === sparse_matrix.nzval

sparse_vector = SparseVector([0, 1, 2])
@assert nzval(sparse_vector) === sparse_vector.nzval
@assert nzval(read_only_array(sparse_vector)) === sparse_vector.nzval
@assert nzval(NamedArray(sparse_vector)) === sparse_vector.nzval

println("OK")

# output

OK

Matrix Formats

Index