Silhouette Visualizer¶
The Silhouette Coefficient is used when the groundtruth about the dataset is unknown and computes the density of clusters computed by the model. The score is computed by averaging the silhouette coefficient for each sample, computed as the difference between the average intracluster distance and the mean nearestcluster distance for each sample, normalized by the maximum value. This produces a score between 1 and 1, where 1 is highly dense clusters and 1 is completely incorrect clustering.
The Silhouette Visualizer displays the silhouette coefficient for each sample on a percluster basis, visualizing which clusters are dense and which are not. This is particularly useful for determining cluster imbalance, or for selecting a value for \(K\) by comparing multiple visualizers.
from sklearn.datasets import make_blobs
# Make 8 blobs dataset
X, y = make_blobs(centers=8)
from sklearn.cluster import MiniBatchKMeans
from yellowbrick.cluster import SilhouetteVisualizer
# Instantiate the clustering model and visualizer
model = MiniBatchKMeans(6)
visualizer = SilhouetteVisualizer(model)
visualizer.fit(X) # Fit the training data to the visualizer
visualizer.poof() # Draw/show/poof the data
API Reference¶
Implements visualizers that use the silhouette metric for cluster evaluation.

class
yellowbrick.cluster.silhouette.
SilhouetteVisualizer
(model, ax=None, **kwargs)[source]¶ Bases:
yellowbrick.cluster.base.ClusteringScoreVisualizer
TODO: Document this class!

draw
(labels)[source]¶ Draw the silhouettes for each sample and the average score.
Parameters:  labels : arraylike
An array with the cluster label for each silhouette sample, usually computed with
predict()
. Labels are not stored on the visualizer so that the figure can be redrawn with new data.
