Clustering models are unsupervised methods that attempt to detect patterns in unlabeled data. There are two primary classes of clustering algorithm: agglomerative clustering links similar data points together, whereas centroidal clustering attempts to find centers or partitions in the data. Yellowbrick provides the
yellowbrick.cluster module to visualize and evaluate clustering behavior. Currently we provide several visualizers to evaluate centroidal mechanisms, particularly K-Means clustering, that help us to discover an optimal \(K\) parameter in the clustering metric:
- Elbow Method: visualize the clusters according to some scoring function, look for an “elbow” in the curve.
- Silhouette Visualizer: visualize the silhouette scores of each cluster in a single model.
- Intercluster Distance Maps: visualize the relative distance and size of clusters.
Because it is very difficult to
score a clustering model, Yellowbrick visualizers wrap scikit-learn clusterer estimators via their
fit() method. Once the clustering model is trained, then the visualizer can call
poof() to display the clustering evaluation metric.