Project new observations onto existing k-means cluster centers.
predict_tgl_kmeans(object, newdata, id_column = FALSE, ...)A tgl_kmeans result from TGL_kmeans_tidy
A matrix or data frame of new observations. Must have the same features (columns) as the data used to create the k-means model. If the first column contains observation IDs (character/factor), it will be used as the id column.
Does newdata's first column contain observation IDs?
If TRUE, the first column is used as IDs. If FALSE (default),
row numbers are used as IDs.
Additional arguments (currently unused)
A tibble with columns: id (observation identifier) and clust
(assigned cluster).
For each observation in newdata, the function computes the distance to every
cluster center and assigns the observation to the nearest center. The distance metric
used is the same one that was used when creating the k-means model ("euclid",
"pearson", or "spearman").
Distance formulas:
euclid: sqrt(sum((x - center)^2, na.rm = TRUE))
pearson: -cor(x, center, use = "pairwise.complete.obs")
spearman: -cor(x, center, method = "spearman", use = "pairwise.complete.obs")
# create 5 clusters normally distributed around 1:5
data <- simulate_data(n = 100, sd = 0.3, nclust = 5, dims = 10)
km <- TGL_kmeans_tidy(data[, -1], k = 5, id_column = FALSE, seed = 60427)
new_data <- simulate_data(n = 10, sd = 0.3, nclust = 5, dims = 10)
predictions <- predict_tgl_kmeans(km, new_data[, -1])
predictions
#> # A tibble: 50 × 2
#> id clust
#> <chr> <int>
#> 1 1 5
#> 2 2 5
#> 3 3 5
#> 4 4 5
#> 5 5 5
#> 6 6 5
#> 7 7 5
#> 8 8 5
#> 9 9 5
#> 10 10 5
#> # ℹ 40 more rows