Intercluster Distance Maps¶
Intercluster distance maps display an embedding of the cluster centers in 2 dimensions with the distance to other centers preserved. E.g. the closer to centers are in the visualization, the closer they are in the original feature space. The clusters are sized according to a scoring metric. By default, they are sized by membership, e.g. the number of instances that belong to each center. This gives a sense of the relative importance of clusters. Note however, that because two clusters overlap in the 2D space, it does not imply that they overlap in the original feature space.
from sklearn.datasets import make_blobs
# Make 12 blobs dataset
X, y = make_blobs(centers=12, n_samples=1000, n_features=16, shuffle=True)
from sklearn.cluster import KMeans
from yellowbrick.cluster import InterclusterDistance
# Instantiate the clustering model and visualizer
visualizer = InterclusterDistance(KMeans(9))
visualizer.fit(X) # Fit the training data to the visualizer
visualizer.poof() # Draw/show/poof the data
API Reference¶
Implements Intercluster Distance Map visualizations.

class
yellowbrick.cluster.icdm.
InterclusterDistance
(model, ax=None, min_size=400, max_size=25000, embedding='mds', scoring='membership', legend=True, legend_loc='lower left', legend_size=1.5, random_state=None, **kwargs)[source]¶ Bases:
yellowbrick.cluster.base.ClusteringScoreVisualizer
Intercluster distance maps display an embedding of the cluster centers in 2 dimensions with the distance to other centers preserved. E.g. the closer to centers are in the visualization, the closer they are in the original feature space. The clusters are sized according to a scoring metric. By default, they are sized by membership, e.g. the number of instances that belong to each center. This gives a sense of the relative importance of clusters. Note however, that because two clusters overlap in the 2D space, it does not imply that they overlap in the original feature space.
Parameters:  model : a ScikitLearn clusterer
Should be an instance of a centroidal clustering algorithm (or a hierarchical algorithm with a specified number of clusters). Also accepts some other models like LDA for text clustering. If it is not a clusterer, an exception is raised.
 ax : matplotlib Axes, default: None
The axes to plot the figure on. If None is passed in the current axes will be used (or generated if required).
 min_size : int, default: 400
The size, in points, of the smallest cluster drawn on the graph. Cluster sizes will be scaled between the min and max sizes.
 max_size : int, default: 25000
The size, in points, of the largest cluster drawn on the graph. Cluster sizes will be scaled between the min and max sizes.
 embedding : default: ‘mds’
The algorithm used to embed the cluster centers in 2 dimensional space so that the distance between clusters is represented equivalently to their relationship in feature spaceself. Embedding algorithm options include:
 mds: multidimensional scaling
 tsne: stochastic neighbor embedding
 scoring : default: ‘membership’
The scoring method used to determine the size of the clusters drawn on the graph so that the relative importance of clusters can be viewed. Scoring method options include:
 membership: number of instances belonging to each cluster
 legend : bool, default: True
Whether or not to draw the size legend onto the graph, omit the legend to more easily see clusters that overlap.
 legend_loc : str, default: “lower left”
The location of the legend on the graph, used to move the legend out of the way of clusters into open space. The same legend location options for matplotlib are used here.
 legend_size : float, default: 1.5
The size, in inches, of the size legend to inset into the graph.
 random_state : int or RandomState, default: None
Fixes the random state for stochastic embedding algorithms.
 kwargs : dict
Keyword arguments passed to the base class and may influence the feature visualization properties.
Notes
Currently the only two embeddings supportted are MDS and TSNE. Soon to follow will be PCoA and a customized version of PCoA for LDA. The only supported scoring metric is membership, but in the future, silhouette scores and cluster diameter will be added.
In terms of algorithm support, right now any clustering algorithm that has a learned
cluster_centers_
andlabels_
attribute will work with the visualizer. In the future, we will update this to work with hierarchical clusterers that haven_components
and LDA.Attributes: cluster_centers_
: array of shape (n_clusters, n_features)Searches for or creates cluster centers for the specified clustering algorithm.
 embedded_centers_ : array of shape (n_clusters, 2)
The positions of all the cluster centers on the graph.
 scores_ : array of shape (n_clusters,)
The scores of each cluster that determine its size on the graph.
 fit_time_ : Timer
The time it took to fit the clustering model and perform the embedding.

cluster_centers_
¶ Searches for or creates cluster centers for the specified clustering algorithm. This algorithm ensures that that the centers are appropriately drawn and scaled so that distance between clusters are maintained.

finalize
()[source]¶ Finalize the visualization to create an “origin grid” feel instead of the default matplotlib feel. Set the title, remove spines, and label the grid with components. This function also adds a legend from the sizes if required.

fit
(X, y=None)[source]¶ Fit the clustering model, computing the centers then embeds the centers into 2D space using the embedding method specified.

lax
¶ Returns the legend axes, creating it only on demand by creating a 2” by 2” inset axes that has no grid, ticks, spines or face frame (e.g is mostly invisible). The legend can then be drawn on this axes.

transformer
¶ Creates the internal transformer that maps the cluster center’s high dimensional space to its two dimensional space.