Silhouette Visualizer

The Silhouette Coefficient is used when the ground-truth about the dataset is unknown and computes the density of clusters computed by the model. The score is computed by averaging the silhouette coefficient for each sample, computed as the difference between the average intra-cluster distance and the mean nearest-cluster distance for each sample, normalized by the maximum value. This produces a score between 1 and -1, where 1 is highly dense clusters and -1 is completely incorrect clustering.

The Silhouette Visualizer displays the silhouette coefficient for each sample on a per-cluster basis, visualizing which clusters are dense and which are not. This is particularly useful for determining cluster imbalance, or for selecting a value for $K$ by comparing multiple visualizers.

# Make 8 blobs dataset
X, y = make_blobs(centers=8)
# Instantiate the clustering model and visualizer
model = MiniBatchKMeans(6)
visualizer = SilhouetteVisualizer(model)

visualizer.fit(X) # Fit the training data to the visualizer
visualizer.poof() # Draw/show/poof the data
../../_images/silhouette.png

API Reference

Implements visualizers that use the silhouette metric for cluster evaluation.

class yellowbrick.cluster.silhouette.SilhouetteVisualizer(model, ax=None, **kwargs)[源代码]

基类:yellowbrick.cluster.base.ClusteringScoreVisualizer

TODO: Document this class!

draw(labels)[源代码]

Draw the silhouettes for each sample and the average score.

Parameters:
labels : array-like

An array with the cluster label for each silhouette sample, usually computed with predict(). Labels are not stored on the visualizer so that the figure can be redrawn with new data.

finalize()[源代码]

Prepare the figure for rendering by setting the title and adjusting the limits on the axes, adding labels and a legend.

fit(X, y=None, **kwargs)[源代码]

Fits the model and generates the the silhouette visualization.

TODO: decide to use this method or the score method to draw. NOTE: Probably this would be better in score, but the standard score is a little different and I'm not sure how it's used.