TGL kmeans with 'tidy' output

```
TGL_kmeans_tidy(
df,
k,
metric = "euclid",
max_iter = 40,
min_delta = 0.0001,
verbose = FALSE,
keep_log = FALSE,
id_column = FALSE,
reorder_func = "hclust",
add_to_data = FALSE,
hclust_intra_clusters = FALSE,
seed = NULL,
use_cpp_random = FALSE
)
```

- df
a data frame or a matrix. Each row is a single observation and each column is a dimension. the first column can contain id for each observation (if id_column is TRUE), otherwise the rownames are used.

- k
number of clusters. Note that in some cases the algorithm might return less clusters than k.

- metric
distance metric for kmeans++ seeding. can be 'euclid', 'pearson' or 'spearman'

- max_iter
maximal number of iterations

- min_delta
minimal change in assignments (fraction out of all observations) to continue iterating

- verbose
display algorithm messages

- keep_log
keep algorithm messages in 'log' field

- id_column
`df`

's first column contains the observation id- reorder_func
function to reorder the clusters. operates on each center and orders by the result. e.g.

`reorder_func = mean`

would calculate the mean of each center and then would reorder the clusters accordingly. If`reorder_func = hclust`

the centers would be ordered by hclust of the euclidean distance of the correlation matrix, i.e.`hclust(dist(cor(t(centers))))`

if NULL, no reordering would be done.- add_to_data
return also the original data frame with an extra 'clust' column with the cluster ids ('id' is the first column)

- hclust_intra_clusters
run hierarchical clustering within each cluster and return an ordering of the observations.

- seed
seed for the c++ random number generator

- use_cpp_random
use c++ random number generator instead of R's. This should be used for only for backwards compatibility, as from version 0.4.0 onwards the default random number generator was changed o R.

list with the following components:

- cluster:
tibble with `id` column with the observation id (`1:n` if no id column was supplied), and `clust` column with the observation assigned cluster.

- centers:
tibble with `clust` column and the cluster centers.

- size:
tibble with `clust` column and `n` column with the number of points in each cluster.

- data:
tibble with `clust` column the original data frame.

- log:
messages from the algorithm run (only if

`id_column = FALSE`

).- order:
tibble with 'id' column, 'clust' column, 'order' column with a new ordering if the observations and 'intra_clust_order' column with the order within each cluster. (only if hclust_intra_clusters = TRUE)

```
# \dontshow{
# this line is only for CRAN checks
tglkmeans.set_parallel(1)
# }
# create 5 clusters normally distributed around 1:5
d <- simulate_data(
n = 100,
sd = 0.3,
nclust = 5,
dims = 2,
add_true_clust = FALSE,
id_column = FALSE
)
head(d)
#> V1 V2
#> 1 0.9695561 1.1848463
#> 2 1.2218107 0.5552327
#> 3 0.4162643 1.0408562
#> 4 0.8653655 1.3884966
#> 5 1.0570451 0.6128050
#> 6 0.5543186 0.7068720
# cluster
km <- TGL_kmeans_tidy(d, k = 5, "euclid", verbose = TRUE)
#> will generate seeds
#> generating seeds
#> at seed 0
#> add new core from 269 to 0
#> at seed 1
#> done update min distance
#> seed range 350 450
#> picked up 417 dist was 1.57597
#> add new core from 417 to 1
#> at seed 2
#> done update min distance
#> seed range 300 400
#> picked up 73 dist was 1.23917
#> add new core from 73 to 2
#> at seed 3
#> done update min distance
#> seed range 250 350
#> picked up 368 dist was 0.662498
#> add new core from 368 to 3
#> at seed 4
#> done update min distance
#> seed range 200 300
#> picked up 193 dist was 0.438478
#> add new core from 193 to 4
#> reassign after init
#> iter 0
#> iter 1 changed 20
#> iter 1
#> iter 2 changed 11
#> iter 2
#> iter 3 changed 3
#> iter 3
#> iter 4 changed 0
km
#> $centers
#> # A tibble: 5 × 3
#> clust V1 V2
#> <int> <dbl> <dbl>
#> 1 1 1.98 2.00
#> 2 2 3.98 3.98
#> 3 3 0.987 0.978
#> 4 4 3.07 3.02
#> 5 5 5.02 4.98
#>
#> $cluster
#> # A tibble: 500 × 2
#> id clust
#> <chr> <int>
#> 1 1 3
#> 2 2 3
#> 3 3 3
#> 4 4 3
#> 5 5 3
#> 6 6 3
#> 7 7 3
#> 8 8 3
#> 9 9 3
#> 10 10 3
#> # ℹ 490 more rows
#>
#> $size
#> # A tibble: 5 × 2
#> clust n
#> <int> <int>
#> 1 1 104
#> 2 2 99
#> 3 3 99
#> 4 4 98
#> 5 5 100
#>
```