Skip to content

Latest commit

 

History

History
118 lines (63 loc) · 6.14 KB

File metadata and controls

118 lines (63 loc) · 6.14 KB

tidyclust (development version)

  • Added butcher support for cluster_fit objects. axe_data() removes the training data stored in the fit, and axe_env() clears the environment reference from the preprocessing terms. (#126)

  • extract_cluster_assignment(), extract_centroids(), and predict() now accept a labels argument, a character vector of cluster labels that overrides the auto-generated prefix-based labels. (#148)

  • hier_clust() gains a dist_fun argument for specifying a custom distance function. (#70)

  • The dist_fun argument accepted by cluster metrics is now documented, including how to use {philentropy} to supply custom distance methods. See vignette("tuning_and_metrics", package = "tidyclust") for examples. (#185)

  • Added a "Getting started with tidyclust" vignette (vignette("tidyclust")). (#232)

  • contr_one_hot is now exported, fixing the indicators = "one_hot" code path in .convert_form_to_x_fit() and .convert_form_to_x_new(). (#218)

  • finalize_model_tidyclust() and finalize_workflow_tidyclust() are deprecated. Use tune::finalize_model() and tune::finalize_workflow() instead, which now support cluster_spec objects natively. (#223)

  • tune_cluster() now warns when passed an apparent() resample. Metrics from apparent resamples are excluded by collect_metrics(summarize = TRUE) (the default) since tune 1.2.0, which caused unexpected NA values. Use collect_metrics(summarize = FALSE) to see per-resample metrics. (#193)

  • hier_clust() documentation now clarifies that predict() may not match extract_cluster_assignment() on training data. This is expected behavior: predict() uses a distance-based heuristic while extract_cluster_assignment() uses cutree() based on the dendrogram structure. (#208)

New Clustering Specifications

  • The db_clust() clustering specification has been added. This specification allows for the use of the DBSCAN algorithm using the dbscan engine. (#209)

  • The gm_clust() clustering specification has been added. This specification allows for the fitting of Gaussian mixture models using the mclust engine. (#209)

  • The mean_shift() clustering specification has been added. This specification fits clusters by iteratively shifting observations toward regions of high density, with the number of clusters determined automatically. The LPCM engine is used. (#240)

  • mean_shift() gains a new engine with meanShiftR. (#244)

  • The .config column produced by tune_cluster() has changed from the Preprocessor{num}_Model{num} pattern to pre{num}_mod{num}_post{num} to align with updates in the tune package. (#220)

  • The foreach package is no longer supported for parallel processing in tune_cluster(). Use the future or mirai packages instead. See ?tune::parallelism for details. (#220)

  • tune_cluster() now supports parallel processing via the mirai package in addition to future. (#220)

  • The .notes column returned by tune_cluster() now includes a trace column containing backtraces for errors and warnings, making it easier to debug failures. (#220)

  • Fixed bug when trying to tune the linkage_method argument. (#206, @lgaborini)

  • sse_within_total() now correctly applies a custom dist_fun when new_data is NULL by using training data stored in the model. (#184)

  • silhouette_avg() now has direction = "maximize" instead of direction = "zero", so that show_best() and select_best() correctly return models with the highest silhouette values. (#212, @dnldelarosa)

tidyclust 0.2.4

  • The philentropy package is now used to calculate distances rather than Rfast. (#199)

tidyclust 0.2.3

  • Update to fix revdep issue for clustMixType. (#190)

tidyclust 0.2.2

  • Update to fix revdep issue for ClusterR. (#186)

tidyclust 0.2.1

  • Small change to let tune package have easy CRAN release. (#178)

tidyclust 0.2.0

New Engines

  • The clustMixType engine as been added to k_means(). This engine allows fitting of k-prototype models. (#63)

  • The klaR engine as been added to k_means(). This engine allows fitting of k-modes models. (#63)

Improvements

  • Engine specific documentation has been added for all models and engines. (#159)

Bug Fixes

  • Fixed bug where engine specific arguments were passed along for k_means() when the engine ClusterR. (#142)

  • Fixed bug where prefix argument wouldn't be correctly passed through extract_cluster_assignment(), extract_centroids(), and predict() (#145)

  • Metric functions now error informatively if used with unfit cluster specifications. (#146)

  • Fixed bug that caused cluster ordering in extract_fit_summary(). (#136)

  • Using extract_cluster_assignment(), extract_centroids() and predict() on a fitted hier_clust() model without specifying num_clust or cut_height now gives more informative error message. (#147)

  • k_means() now errors informatively if fit() without num_clust specified. (#134)

  • Fixed bug where levels didn't match number of clusters if prediction on fewer number of observations. (#158)

  • Fixed bug where tune_cluster() would error if used with an recipe that contained non-predictor variables such as id variables. (#124)

Breaking Changes

  • Exported internal functions ClusterR_kmeans_fit(), stats_kmeans_fit(), and hclust_fit() have been renamed to .k_means_fit_ClusterR(), .k_means_fit_stats(), and .hier_clust_fit_stats() to reduce visibility for users.

  • Cluster reordering is now done at the fitting time, not the extraction and prediction time. (#154)

tidyclust 0.1.2

  • The cluster specification methods for generics::tune_args() and generics::tunable() are now registered unconditionally (#115).

tidyclust 0.1.1

  • Fixed bug where extract_cluster_assignment() and predict() sometimes didn't have agreement of clusters. (#94)

  • silhouette() and silhouette_avg() now return NAs instead of erroring when applied to a clustering object with 1 cluster. (#104)

  • Fixed bug where extract_cluster_assignment() doesn't work for hier_clust() models in workflows where num_clusters is specified in extract_cluster_assignment().

tidyclust 0.1.0

  • Added a NEWS.md file to track changes to the package.