PCA Projection¶
The PCA Decomposition visualizer utilizes principal component analysis to decompose high dimensional data into two or three dimensions so that each instance can be plotted in a scatter plot. The use of PCA means that the projected dataset can be analyzed along axes of principal variation and can be interpreted to determine if spherical distance metrics can be utilized.
from yellowbrick.datasets import load_credit
from yellowbrick.features.pca import PCADecomposition
# Specify the features of interest and the target
X, y = load_credit()
# Create a list of colors to assign to points in the plot
colors = np.array(['r' if yi else 'b' for yi in y])
visualizer = PCADecomposition(scale=True, color=colors)
visualizer.fit_transform(X, y)
visualizer.poof()
(Source code, png, pdf)
The PCA projection can also be plotted in three dimensions to attempt to visualize more principal components and get a better sense of the distribution in high dimensions.
from yellowbrick.datasets import load_credit
from yellowbrick.features.pca import PCADecomposition
X, y = load_credit()
colors = np.array(['r' if yi else 'b' for yi in y])
visualizer = PCADecomposition(scale=True, color=colors, proj_dim=3)
visualizer.fit_transform(X, y)
visualizer.poof()
(Source code, png, pdf)
Biplot¶
The PCA projection can be enhanced to a biplot whose points are the projected instances and whose vectors represent the structure of the data in high dimensional space. By using proj_features=True
, vectors for each feature in the dataset are drawn on the scatter plot in the direction of the maximum variance for that feature. These structures can be used to analyze the importance of a feature to the decomposition or to find features of related variance for further analysis.
from yellowbrick.datasets import load_concrete
from yellowbrick.features.pca import PCADecomposition
# Load the concrete dataset
X, y = load_concrete()
visualizer = PCADecomposition(scale=True, proj_features=True)
visualizer.fit_transform(X, y)
visualizer.poof()
(Source code, png, pdf)
from yellowbrick.datasets import load_concrete
from yellowbrick.features.pca import PCADecomposition
X, y = load_concrete()
visualizer = PCADecomposition(scale=True, proj_features=True, proj_dim=3)
visualizer.fit_transform(X, y)
visualizer.poof()
(Source code, png, pdf)
API Reference¶
Decomposition based feature visualization with PCA.

class
yellowbrick.features.pca.
PCADecomposition
(ax=None, features=None, scale=True, proj_dim=2, proj_features=False, color=None, colormap='RdBu', alpha=0.75, random_state=None, colorbar=False, heatmap=False, **kwargs)[source]¶ Bases:
yellowbrick.features.base.MultiFeatureVisualizer
Produce a two or three dimensional principal component plot of a data array projected onto its largest sequential principal components. It is common practice to scale the data array
X
before applying a PC decomposition. Variable scaling can be controlled using thescale
argument.Parameters:  ax : matplotlib Axes, default: None
The axes to plot the figure on. If None is passed in, the current axes will be used (or generated if required).
 features : list, default: None
A list of feature names to use. If a DataFrame is passed to fit and features is None, feature names are selected as the columns of the DataFrame.
 scale : bool, default: True
Boolean that indicates if user wants to scale data.
 proj_dim : int, default: 2
Dimension of the PCA visualizer.
 proj_features : bool, default: False
Boolean that indicates if the user wants to project the features in the projected space. If True the plot will be similar to a biplot.
 color : list or tuple of colors, default: None
Specify the colors for each individual class.
 colormap : string or cmap, default: None
Optional string or matplotlib cmap to colorize lines. Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.
 alpha : float, default: 0.75
Specify a transparency where 1 is completely opaque and 0 is completely transparent. This property makes densely clustered points more visible.
 random_state : int, RandomState instance or None, optional (default None)
This parameter sets the random state on this solver. If the input X is larger than 500x500 and the number of components to extract is lower than 80% of the smallest dimension of the data, then the more efficient randomized solver is enabled.
 colorbar : bool, default: False
Add a colorbar to shows the range in magnitude of feature values to the component.
 heatmap : bool, default: False
Add a heatmap to explain which features contribute most to which component.
 kwargs : dict
Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.
Examples
>>> from sklearn import datasets >>> iris = datasets.load_iris() >>> X = iris.data >>> y = iris.target >>> visualizer = PCADecomposition() >>> visualizer.fit_transform(X, y) >>> visualizer.poof()

draw
(self, **kwargs)[source]¶ Plots a scatterplot of points that represented the decomposition, pca_features_, of the original features, X, projected into either 2 or 3 dimensions.
If 2 dimensions are selected, a colorbar and heatmap can also be optionally included to show the magnitude of each feature value to the component.
Returns:  self : visualizer.ax
Returns the axes of the visualizer for use in Pipelines

finalize
(self, **kwargs)[source]¶ Draws the title, labels, legends, heatmap, and colorbar as specified by the keyword arguments.

fit
(self, X, y=None, **kwargs)[source]¶ Fits the PCA transformer, transforms the data in X, then draws the decomposition in either 2D or 3D space as a scatter plot.
Parameters:  X : ndarray or DataFrame of shape n x m
A matrix of n instances with m features.
 y : ndarray or Series of length n
An array or series of target or class values.
Returns:  self : visualizer
Returns self for use in Pipelines

transform
(self, X, y=None, **kwargs)[source]¶ Calls the internal transform method of the scikitlearn PCA transformer, which performs a dimensionality reduction on the input features
X
. Next calls thedraw
method of the Yellowbrick visualizer, finally returning a new array of transformed features of shape(len(X), proj_dim)
.Parameters:  X : ndarray or DataFrame of shape n x m
A matrix of n instances with m features.
 y : ndarray or Series of length n
An array or series of target or class values.
Returns:  pca_features_ : ndarray or DataFrame of shape n x m
Returns a new arraylike object of transformed features of shape
(len(X), proj_dim)
.