Slanter
Slanter.slanted_orders
—
Function
function slanted_orders(
data::AbstractMatrix{<:Real};
order_rows::Bool=true,
order_cols::Bool=true,
squared_order::Bool=true,
same_order::Bool=false,
discount_outliers::Bool=true,
max_spin_count::Integer=10
)::Tuple{AbstractVector{<:Integer}, AbstractVector{<:Integer}}
Compute rows and columns orders which move high values close to the diagonal.
For a matrix expressing the cross-similarity between two (possibly different) sets of entities, this produces better results than clustering. This is because clustering does not care about the order of each two sub-partitions. That is, clustering is as happy with
((2, 1), (4, 3))
as it is with the more sensible
((1, 2), (3, 4))
. As a result, visualizations of similarities using naive clustering can be misleading.
This situation is worse in Python or R than it is in Julia, which mercifully provides the
:barjoseph
method to reorder the branches. Still, the method here may be "more suitable" in specific circumstances (when the data depicts a clear gradient).
Parameters
-
data: A rectangular matrix containing non-negative values (may be negative ifsquared_order). -
order_rows: Whether to reorder the rows. -
order_cols: Whether to reorder the columns. -
squared_order: Whether to reorder to minimize the l2 norm (otherwise minimizes the l1 norm). -
same_order: Whether to apply the same order to both rows and columns. -
discount_outliers: Whether to do a final order phase discounting outlier values far from the diagonal. -
max_spin_count: How many times to retry improving the solution before giving up.
Returns
A tuple with two vectors, which contain the order of the rows and the columns.
Slanter.slanted_reorder
—
Function
slanted_reorder(
data::AbstractMatrix{T};
order_data::Union{AbstractMatrix{<:Real}, Nothing}=nothing,
order_rows::Bool=true,
order_cols::Bool=true,
squared_order::Bool=true,
same_order::Bool=false,
discount_outliers::Bool=true
)::AbstractMatrix{T} where {T<:Real}
Reorder data rows and columns to move high values close to the diagonal.
Given a matrix expressing the cross-similarity between two (possibly different) sets of entities, this uses
slanted_orders
to compute the "best" order for visualizing the matrix, then returns the reordered data.
Parameters
-
data: A rectangular matrix to reorder, of non-negative values (unlessorder_datais specified, orsquared_order)). -
order_data: An optional matrix of non-negative values of the same size to use for computing the orders (may be negative ifsquared_order). -
order_rows: Whether to reorder the rows. -
order_cols: Whether to reorder the columns. -
squared_order: Whether to reorder to minimize the l2 norm (otherwise minimizes the l1 norm). -
same_order: Whether to apply the same order to both rows and columns. -
discount_outliers: Whether to do a final order phase discounting outlier values far from the diagonal.
Returns
A matrix of the same shape whose rows and columns are a permutation of the input.
Slanter.reorder_hclust
—
Function
reorder_hclust(clusters, order)
Given a clustering of some data, and some ideal order we'd like to use to visualize it, reorder (but do not modify) the clustering to be as consistent as possible with this ideal order.
Parameters
-
clusters: The existing clustering of the data. -
order: The ideal order we'd like to see the data in.
Returns
A reordered clustering which is consistent, wherever possible, with the ideal order.
Slanter.EnhancedHclust
—
Module
Enhancements of [hclust]*https://github.com/JuliaStats/Clustering.jl/blob/master/src/hclust.jl)
Slanter.EnhancedHclust.ehclust
—
Function
ehclust(d::AbstractMatrix; [linkage], [uplo], [branchorder], [order]) -> Hclust
Enhanced [hclust]*https://github.com/JuliaStats/Clustering.jl/blob/master/src/hclust.jl). This is similar to
hclust
with the following extensions:
- If
branchorderis a vector ofRealnumbers, one per leaf, then we reorder the branches so that each leaf position would be as close as possible to itsbranchordervalue. Technically we compute a center of gravity for each node and reorder the tree such that that at each branch, the left sub-tree center of gravity is to the left (lower than) the center of gravity of the right sub-tree. - If
orderis specified, it must be a permutation of the 1:N leaf indices. This will be the final order of the result; that is, we constrain the tree so that each node covers a continuous range of leaves (by this order). If you specify an explicitbranchorder, this will rotate some nodes so the result will no longer be in the specified order, but the tree is still constrained as above. - If
groupsis a vector of strings, then we first cluster all the entries for each group together, then combine the results. This is mutually exclusive with specifying anorder. - If
groupsis a vector of integers, they are expected to cover a range 1:N. We again cluster each group separately, and then cluster the groups enforcing them to be in ascending order. Applyingbranchorderin this case will only reorder branches inside each group, preserving the ascending order between the groups.
The
groups
and
order
parameters are mutually exclusive.
using Test
using Clustering
using Distances
data = rand(4, 10)
distances = pairwise(Euclidean(), data; dims = 2)
result = hclust(distances)
eresult = ehclust(distances)
@test result.order == eresult.order
result = hclust(distances; linkage = :ward)
eresult = ehclust(distances; linkage = :ward)
@test result.order == eresult.order
println("OK")
# output
OK
using Test
using Distances
data = rand(4, 10)
distances = pairwise(Euclidean(), data; dims = 2)
positions = rand(10)
result = ehclust(distances; branchorder = positions)
merges_data = Vector{Tuple{Int, Float64}}(undef, 9)
for merge_index in 1:9
left = result.merges[merge_index, 1]
if left < 0
left_size = 1
left_center = positions[-left]
else
left_size, left_center = merges_data[left]
end
right = result.merges[merge_index, 2]
if right < 0
right_size = 1
right_center = positions[-right]
else
right_size, right_center = merges_data[right]
end
@test left_center <= right_center
merged_size = left_size + right_size
merged_center = (left_center * left_size + right_center * right_size) / merged_size
merges_data[merge_index] = (merged_size, merged_center)
end
println("OK")
# output
OK
using Test
using Distances
using Random
data = rand(4, 10)
distances = pairwise(Euclidean(), data; dims = 2)
order = collect(1:10)
shuffle!(order)
result = ehclust(distances; order)
@test result.order == order
result = ehclust(distances; branchorder = :r, order)
@test result.order != order
println("OK")
# output
OK
using Test
using Distances
using Random
data = rand(4, 10)
distances = pairwise(Euclidean(), data; dims = 2)
groups = ["A", "B"][rand(1:2, 10)]
result = ehclust(distances; groups)
a_indices = findall(groups[result.order] .== "A")
b_indices = findall(groups[result.order] .== "B")
@assert maximum(a_indices) < minimum(b_indices) || maximum(b_indices) < minimum(a_indices)
println("OK")
# output
OK
using Test
using Distances
using Random
data = rand(4, 10)
distances = pairwise(Euclidean(), data; dims = 2)
groups = rand(1:2, 10)
result = ehclust(distances; groups)
one_indices = findall(groups[result.order] .== 1)
two_indices = findall(groups[result.order] .== 2)
@assert maximum(one_indices) < minimum(two_indices)
groups = 3 .- groups
result = ehclust(distances; groups)
one_indices = findall(groups[result.order] .== 1)
two_indices = findall(groups[result.order] .== 2)
@assert maximum(one_indices) < minimum(two_indices)
bresult = ehclust(distances; branchorder = :r, groups)
@assert bresult.order != result.order
one_indices = findall(groups[bresult.order] .== 1)
two_indices = findall(groups[bresult.order] .== 2)
@assert maximum(one_indices) < minimum(two_indices)
println("OK")
# output
OK