data_colors.Rd
Given a matrix of observation/element rows and variable/measurement columns, compute a color for each row (or group of rows) such that the colors are distinct, and where more-similar colors roughly designate more-similar data rows (or groups of rows).
data_colors(
data,
run_umap = TRUE,
groups = NULL,
minimal_saturation = 33,
minimal_lightness = 20,
maximal_lightness = 80
)
A matrix whose rows represent elements/observations and columns represent variables/measurements.
A boolean specifying whether to run UMAP on the data to convert it to 3D (by default,
TRUE
). If FALSE
, the data matrix must have exactly 3 columns and
will be used as-is.
An optional array with an entry per row containing the identifier of the group the row belongs to.
Exclude colors whose saturation (hypot(a, b)
in CIELAB color
space) is less than this value (by default, 33).
Exclude colors whose lightnes (l
in CIELAB color space) is less
than this value (by default, 20).
Exclude colors whose lightnes (l
in CIELAB color space) is more
than this value (by default, 80).
An array with one entry per row, whose names are the matrix rownames
, containing the
color of each row. If groups
was specified, the array will contain one entry per
unique group identifier, whose names are the as.character
group identifiers,
containing the color of each group.
This is intended to provide a "reasonable" set of colors to "arbitrary" data, for use as a convenient default when investigating unknown data sets. It is not meant to replace hand-picked colors tailored for specific data (e.g. using red colors for "bad" rows and green colors for "good" rows).
This ensures all colors are distinct by packing the (visible part) of the CIELAB color space with the needed number of spheres. To assign the colors to the data, it uses UMAP to reduce the data to 3D. It then uses principal component analysis to represent both the chosen colors (3D sphere centers) and the (3D UMAP) data as point clouds with coordinates in the range 0-1, and finally uses a stable matching algorithm to map these point clouds to each other, thereby assigning a color to each data row. If the data is grouped, then the center of gravity of each group is used to generate a color for each group.
chameleon::data_colors(stackloss)
#> [1] "#D86544" "#8B262B" "#EF5205" "#818F3F" "#C5699E" "#AF7E42" "#4D966F"
#> [8] "#3B9B3D" "#967ACB" "#752D7B" "#C95CCC" "#E55573" "#EF3F9F" "#FE3847"
#> [15] "#F517CC" "#8F71F9" "#721BA5" "#2387F9" "#C84DFA" "#3241A5" "#4A8CCA"