Manifold Visualization¶
The Manifold
visualizer provides high dimensional visualization using
manifold learning
to embed instances described by many dimensions into 2, thus allowing the
creation of a scatter plot that shows latent structures in data. Unlike
decomposition methods such as PCA and SVD, manifolds generally use
nearestneighbors approaches to embedding, allowing them to capture nonlinear
structures that would be otherwise lost. The projections that are produced
can then be analyzed for noise or separability to determine if it is possible
to create a decision space in the data.
The Manifold
visualizer allows access to all currently available
scikitlearn manifold implementations by specifying the manifold as a string to the visualizer. The currently implemented default manifolds are as follows:
Manifold  Description 
"lle" 
Locally Linear Embedding (LLE) uses many local linear decompositions to preserve globally nonlinear structures. 
"ltsa" 
LTSA LLE: local tangent space alignment is similar to LLE in that it uses locality to preserve neighborhood distances. 
"hessian" 
Hessian LLE an LLE regularization method that applies a hessianbased quadratic form at each neighborhood 
"modified" 
Modified LLE applies a regularization parameter to LLE. 
"isomap" 
Isomap seeks a lower dimensional embedding that maintains geometric distances between each instance. 
"mds" 
MDS: multidimensional scaling uses similarity to plot points that are near to each other close in the embedding. 
"spectral" 
Spectral Embedding a discrete approximation of the low dimensional manifold using a graph representation. 
"tsne" 
tSNE: converts the similarity of points into probabilities then uses those probabilities to create an embedding. 
Each manifold algorithm produces a different embedding and takes advantage of different properties of the underlying data. Generally speaking, it requires multiple attempts on new data to determine the manifold that works best for the structures latent in your data. Note however, that different manifold algorithms have different time, complexity, and resource requirements.
Manifolds can be used on many types of problems, and the color used in the scatter plot can describe the target instance. In an unsupervised or clustering problem, a single color is used to show structure and overlap. In a classification problem discrete colors are used for each class. In a regression problem, a color map can be used to describe points as a heat map of their regression values.
Discrete Target¶
In a classification or clustering problem, the instances can be described by discrete labels  the classes or categories in the supervised problem, or the clusters they belong to in the unsupervised version. The manifold visualizes this by assigning a color to each label and showing the labels in a legend.
# Load the classification data set
data = load_data('occupancy')
# Specify the features of interest
features = [
"temperature", "relative humidity", "light", "C02", "humidity"
]
# Extract the instances and target
X = data[features]
y = data.occupancy
from yellowbrick.features.manifold import Manifold
visualizer = Manifold(manifold='tsne', target='discrete')
visualizer.fit_transform(X,y)
visualizer.poof()
The visualization also displays the amount of time it takes to generate the
embedding; as you can see, this can take a long time even for relatively
small datasets. One tip is scale your data using the StandardScalar
;
another is to sample your instances (e.g. using train_test_split
to
preserve class stratification) or to filter features to decrease sparsity in
the dataset.
One common mechanism is to use SelectKBest to select the features that have
a statistical correlation with the target dataset. For example, we can use
the f_classif
score to find the 3 best features in our occupancy dataset.
from sklearn.pipeline import Pipeline
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import f_classif
model = Pipeline([
("selectk", SelectKBest(k=3, score_func=f_classif)),
("viz", Manifold(manifold='isomap', target='discrete')),
])
# Load the classification dataset
data = load_data("occupancy")
# Specify the features of interest
features = [
"temperature", "relative humidity", "light", "CO2", "humidity"
]
# Extract the instances and target
X = data[features]
y = data.occupancy
model.fit(X, y)
model.named_steps['viz'].poof()
Continuous Target¶
For a regression target or to specify color as a heatmap of continuous
values, specify target='continuous'
. Note that by default the param
target='auto'
is set, which determines if the target is discrete or
continuous by counting the number of unique values in y
.
# Specify the features of interest
feature_names = [
'cement', 'slag', 'ash', 'water', 'splast', 'coarse', 'fine', 'age'
]
target_name = 'strength'
# Get the X and y data from the DataFrame
X = data[feature_names]
y = data[target_name]
visualizer = Manifold(manifold='isomap', target='continuous')
visualizer.fit_transform(X,y)
visualizer.poof()
API Reference¶
Use manifold algorithms for high dimensional visualization.

class
yellowbrick.features.manifold.
Manifold
(ax=None, manifold='lle', n_neighbors=10, colors=None, target='auto', alpha=0.7, random_state=None, **kwargs)[source]¶ Bases:
yellowbrick.features.base.FeatureVisualizer
The Manifold visualizer provides high dimensional visualization for feature analysis by embedding data into 2 dimensions using the sklearn.manifold package for manifold learning. In brief, manifold learning algorithms are unsuperivsed approaches to nonlinear dimensionality reduction (unlike PCA or SVD) that help visualize latent structures in data.
The manifold algorithm used to do the embedding in scatter plot space can either be a transformer or a string representing one of the already specified manifolds as follows:
Manifold Description "lle"
Locally Linear Embedding "ltsa"
LTSA LLE "hessian"
Hessian LLE "modified"
Modified LLE "isomap"
Isomap "mds"
MultiDimensional Scaling "spectral"
Spectral Embedding "tsne"
tSNE Each of these algorithms embeds nonlinear relationships in different ways, allowing for an exploration of various structures in the feature space. Note however, that each of these algorithms has different time, memory and complexity requirements; take special care when using large datasets!
The Manifold visualizer also shows the specified target (if given) as the color of the scatter plot. If a classification or clustering target is given, then discrete colors will be used with a legend. If a regression or continuous target is specified, then a colormap and colorbar will be shown.
Parameters:  ax : matplotlib Axes, default: None
The axes to plot the figure on. If None, the current axes will be used or generated if required.
 manifold : str or Transformer, default: “lle”
Specify the manifold algorithm to perform the embedding. Either one of the strings listed in the table above, or an actual scikitlearn transformer. The constructed manifold is accessible with the manifold property, so as to modify hyperparameters before fit.
 n_neighbors : int, default: 10
Many manifold algorithms are nearest neighbors based, for those that are, this parameter specfies the number of neighbors to use in the embedding. If the manifold algorithm doesn’t use nearest neighbors, then this parameter is ignored.
 colors : str or list of colors, default: None
Specify the colors used, though note that the specification depends very much on whether the target is continuous or discrete. If continuous, colors must be the name of a colormap. If discrete, then colors can be the name of a palette or a list of colors to use for each class in the target.
 target : str, default: “auto”
Specify the type of target as either “discrete” (classes) or “continuous” (real numbers, usually for regression). If “auto”, the Manifold will attempt to determine the type by counting the number of unique values.
If the target is discrete, points will be colored by the target class and a legend will be displayed. If continuous, points will be displayed with a colormap and a color bar will be displayed. In either case, if no target is specified, only a single color will be drawn.
 alpha : float, default: 0.7
Specify a transparency where 1 is completely opaque and 0 is completely transparent. This property makes densely clustered points more visible.
 random_state : int or RandomState, default: None
Fixes the random state for stochastic manifold algorithms.
 kwargs : dict
Keyword arguments passed to the base class and may influence the feature visualization properties.
Notes
Specifying the target as
'continuous'
or'discrete'
will influence how the visualizer is finally displayed, don’t rely on the automatic determination from the Manifold!Scaling your data with the standard scalar before applying it to the visualizer is a great way of increasing performance. Additionally using the
SelectKBest
transformer may also improve performance and lead to better visualizations.Warning
Manifold visualizers have extremly varying time, resource, and complexity requirements. Sampling data or features may be necessary in order to finish a manifold computation.
See also
The ScikitLearn discussion on Manifold Learning.
Examples
>>> viz = Manifold(manifold='isomap', target='discrete') >>> viz.fit_transform(X, y) >>> viz.poof()
Attributes:  fit_time_ : yellowbrick.utils.timer.Timer
The amount of time in seconds it took to fit the Manifold.
 classes_ : np.ndarray, optional
If discrete, the classes identified in the target y.
 range_ : tuple of floats, optional
If continuous, the maximum and minimum values in the target y.

ALGORITHMS
= {'hessian': LocallyLinearEmbedding(eigen_solver='auto', hessian_tol=0.0001, max_iter=100, method='hessian', modified_tol=1e12, n_components=2, n_jobs=None, n_neighbors=5, neighbors_algorithm='auto', random_state=None, reg=0.001, tol=1e06), 'isomap': Isomap(eigen_solver='auto', max_iter=None, n_components=2, n_jobs=None, n_neighbors=5, neighbors_algorithm='auto', path_method='auto', tol=0), 'lle': LocallyLinearEmbedding(eigen_solver='auto', hessian_tol=0.0001, max_iter=100, method='standard', modified_tol=1e12, n_components=2, n_jobs=None, n_neighbors=5, neighbors_algorithm='auto', random_state=None, reg=0.001, tol=1e06), 'ltsa': LocallyLinearEmbedding(eigen_solver='auto', hessian_tol=0.0001, max_iter=100, method='ltsa', modified_tol=1e12, n_components=2, n_jobs=None, n_neighbors=5, neighbors_algorithm='auto', random_state=None, reg=0.001, tol=1e06), 'mds': MDS(dissimilarity='euclidean', eps=0.001, max_iter=300, metric=True, n_components=2, n_init=4, n_jobs=None, random_state=None, verbose=0), 'modified': LocallyLinearEmbedding(eigen_solver='auto', hessian_tol=0.0001, max_iter=100, method='modified', modified_tol=1e12, n_components=2, n_jobs=None, n_neighbors=5, neighbors_algorithm='auto', random_state=None, reg=0.001, tol=1e06), 'spectral': SpectralEmbedding(affinity='nearest_neighbors', eigen_solver=None, gamma=None, n_components=2, n_jobs=None, n_neighbors=None, random_state=None), 'tsne': TSNE(angle=0.5, early_exaggeration=12.0, init='pca', learning_rate=200.0, method='barnes_hut', metric='euclidean', min_grad_norm=1e07, n_components=2, n_iter=1000, n_iter_without_progress=300, perplexity=30.0, random_state=None, verbose=0)}¶

draw
(X, y=None)[source]¶ Draws the points described by X and colored by the points in y. Can be called multiple times before finalize to add more scatter plots to the axes, however
fit()
must be called before use.Parameters:  X : arraylike of shape (n, 2)
The matrix produced by the
transform()
method. y : arraylike of shape (n,), optional
The target, used to specify the colors of the points.
Returns:  self.ax : matplotlib Axes object
Returns the axes that the scatter plot was drawn on.

fit
(X, y=None)[source]¶ Fits the manifold on X and transforms the data to plot it on the axes. See fit_transform() for more details.

fit_transform
(X, y=None)[source]¶ Fits the manifold on X and transforms the data to plot it on the axes. The optional y specified can be used to declare discrete colors. If the target is set to ‘auto’, this method also determines the target type, and therefore what colors will be used.
Note also that fit records the amount of time it takes to fit the manifold and reports that information in the visualization.
Parameters:  X : arraylike of shape (n, m)
A matrix or data frame with n instances and m features where m > 2.
 y : arraylike of shape (n,), optional
A vector or series with target values for each instance in X. This vector is used to determine the color of the points in X.
Returns:  self : Manifold
Returns the visualizer object.

manifold
¶ Property containing the manifold transformer constructed from the supplied hyperparameter. Use this property to modify the manifold before fit with
manifold.set_params()
.