Silhouette Visualizer¶
The Silhouette Coefficient is used when the groundtruth about the dataset is unknown and computes the density of clusters computed by the model. The score is computed by averaging the silhouette coefficient for each sample, computed as the difference between the average intracluster distance and the mean nearestcluster distance for each sample, normalized by the maximum value. This produces a score between 1 and 1, where 1 is highly dense clusters and 1 is completely incorrect clustering.
The Silhouette Visualizer displays the silhouette coefficient for each sample on a percluster basis, visualizing which clusters are dense and which are not. This is particularly useful for determining cluster imbalance, or for selecting a value for \(K\) by comparing multiple visualizers.
from sklearn.cluster import MiniBatchKMeans
from sklearn.datasets import make_blobs
from yellowbrick.cluster import SilhouetteVisualizer
# Generate synthetic dataset with 8 random clusters
X, y = make_blobs(n_samples=1000, n_features=12, centers=8, random_state=42)
# Instantiate the clustering model and visualizer
model = MiniBatchKMeans(6)
visualizer = SilhouetteVisualizer(model)
visualizer.fit(X) # Fit the data to the visualizer
visualizer.poof() # Draw/show/poof the data
(Source code, png, pdf)
API Reference¶
Implements visualizers that use the silhouette metric for cluster evaluation.

class
yellowbrick.cluster.silhouette.
SilhouetteVisualizer
(model, ax=None, colors=None, **kwargs)[source]¶ Bases:
yellowbrick.cluster.base.ClusteringScoreVisualizer
The Silhouette Visualizer displays the silhouette coefficient for each sample on a percluster basis, visually evaluating the density and separation between clusters. The score is calculated by averaging the silhouette coefficient for each sample, computed as the difference between the average intracluster distance and the mean nearestcluster distance for each sample, normalized by the maximum value. This produces a score between 1 and +1, where scores near +1 indicate high separation and scores near 1 indicate that the samples may have been assigned to the wrong cluster.
In SilhouetteVisualizer plots, clusters with higher scores have wider silhouettes, but clusters that are less cohesive will fall short of the average score across all clusters, which is plotted as a vertical dotted red line.
This is particularly useful for determining cluster imbalance, or for selecting a value for K by comparing multiple visualizers.
Parameters:  model : a ScikitLearn clusterer
Should be an instance of a centroidal clustering algorithm (
KMeans
orMiniBatchKMeans
). ax : matplotlib Axes, default: None
The axes to plot the figure on. If None is passed in the current axes will be used (or generated if required).
 colors : iterable or string, default: None
A collection of colors to use for each cluster group. If there are fewer colors than cluster groups, colors will repeat. May also be a Yellowbrick or matplotlib colormap string.
 kwargs : dict
Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.
Examples
>>> from yellowbrick.cluster import SilhouetteVisualizer >>> from sklearn.cluster import KMeans >>> model = SilhouetteVisualizer(KMeans(10)) >>> model.fit(X) >>> model.poof()
Attributes:  silhouette_score_ : float
Mean Silhouette Coefficient for all samples. Computed via scikitlearn sklearn.metrics.silhouette_score.
 silhouette_samples_ : array, shape = [n_samples]
Silhouette Coefficient for each samples. Computed via scikitlearn sklearn.metrics.silhouette_samples.
 n_samples_ : integer
Number of total samples in the dataset (X.shape[0])
 n_clusters_ : integer
Number of clusters (e.g. n_clusters or k value) passed to internal scikitlearn model.
 y_tick_pos_ : array of shape (n_clusters,)
The computed center positions of each cluster on the yaxis

draw
(self, labels)[source]¶ Draw the silhouettes for each sample and the average score.
Parameters:  labels : arraylike
An array with the cluster label for each silhouette sample, usually computed with
predict()
. Labels are not stored on the visualizer so that the figure can be redrawn with new data.