yellowbrick.features package

Submodules

yellowbrick.features.base module

Base classes for feature visualizers and feature selection tools.

class yellowbrick.features.base.DataVisualizer(ax=None, features=None, classes=None, color=None, colormap=None, **kwargs)[source]

Bases: yellowbrick.features.base.FeatureVisualizer

Data Visualizers are a subclass of Feature Visualiers which plot the instances in feature space (also called data space, hence the name of the visualizer). Feature space is a multi-dimensional space defined by the columns of the instance dependent vector input, X which is passed to fit() and transform(). Instances can also be labeled by the target independent vector input, y which is only passed to fit(). For that reason most Data Visualizers perform their drawing in fit().

This class provides helper functionality related to target identification: whether or not the target is sequential or categorical, and mapping a color sequence or color set to the targets as appropriate. It also uses the fit method to call the drawing utilities.

Parameters:

ax: matplotlib Axes, default: None

The axis to plot the figure on. If None is passed in the current axes will be used (or generated if required).

features: list, default: None

a list of feature names to use If a DataFrame is passed to fit and features is None, feature names are selected as the columns of the DataFrame.

classes: list, default: None

a list of class names for the legend If classes is None and a y value is passed to fit then the classes are selected from the target vector.

color: list or tuple, default: None

optional list or tuple of colors to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

colormap: string or cmap, default: None

optional string or matplotlib cmap to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Notes

These parameters can be influenced later on in the visualization process, but can and should be set as early as possible.

fit(X, y=None, **kwargs)[source]

The fit method is the primary drawing input for the parallel coords visualization since it has both the X and y data required for the viz and the transform method does not.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

kwargs : dict

Pass generic arguments to the drawing method

Returns:

self : instance

Returns the instance of the transformer/visualizer

class yellowbrick.features.base.FeatureVisualizer(ax=None, **kwargs)[source]

Bases: yellowbrick.base.Visualizer, sklearn.base.TransformerMixin

Base class for feature visualization to investigate features individually or together.

FeatureVisualizer is itself a transformer so that it can be used in a Scikit-Learn Pipeline to perform automatic visual analysis during build.

Accepts as input a DataFrame or Numpy array.

fit(X, y=None, **fit_params)[source]

This method performs preliminary computations in order to set up the figure or perform other analyses. It can also call drawing methods in order to set up various non-instance related figure elements.

This method must return self.

fit_transform_poof(X, y=None, **kwargs)[source]

Fit to data, transform it, then visualize it.

Fits the visualizer to X and y with opetional parameters by passing in all of kwargs, then calls poof with the same kwargs. This method must return the result of the transform method.

transform(X)[source]

Primarily a pass-through to ensure that the feature visualizer will work in a pipeline setting. This method can also call drawing methods in order to ensure that the visualization is constructed.

This method must return a numpy array with the same shape as X.

yellowbrick.features.pcoords module

Implementations of parallel coordinates for multi-dimensional feature analysis. There are a variety of parallel coordinates from Andrews Curves to coordinates that optimize column order.

class yellowbrick.features.pcoords.ParallelCoordinates(ax=None, features=None, classes=None, color=None, colormap=None, vlines=True, vlines_kwds=None, **kwargs)[source]

Bases: yellowbrick.features.base.DataVisualizer

Parallel coordinates displays each feature as a vertical axis spaced evenly along the horizontal, and each instance as a line drawn between each individual axis.

Parameters:

ax : matplotlib Axes, default: None

The axis to plot the figure on. If None is passed in the current axes will be used (or generated if required).

features : list, default: None

a list of feature names to use If a DataFrame is passed to fit and features is None, feature names are selected as the columns of the DataFrame.

classes : list, default: None

a list of class names for the legend If classes is None and a y value is passed to fit then the classes are selected from the target vector.

color : list or tuple, default: None

optional list or tuple of colors to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

colormap : string or cmap, default: None

optional string or matplotlib cmap to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

vlines : boolean, default: True

flag to determine vertical line display

vlines_kwds : dict, default: None

options to style or display the vertical lines, default: None

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Notes

These parameters can be influenced later on in the visualization process, but can and should be set as early as possible.

Examples

>>> visualizer = ParallelCoordinates()
>>> visualizer.fit(X, y)
>>> visualizer.transform(X)
>>> visualizer.poof()
draw(X, y, **kwargs)[source]

Called from the fit method, this method creates the parallel coordinates canvas and draws each instance and vertical lines on it.

finalize(**kwargs)[source]

Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

Parameters:kwargs: generic keyword arguments.
yellowbrick.features.pcoords.parallel_coordinates(X, y=None, ax=None, features=None, classes=None, color=None, colormap=None, vlines=True, vlines_kwds=None, **kwargs)[source]

Displays each feature as a vertical axis and each instance as a line.

This helper function is a quick wrapper to utilize the ParallelCoordinates Visualizer (Transformer) for one-off analysis.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

ax : matplotlib Axes, default: None

The axes to plot the figure on.

features : list of strings, default: None

The names of the features or columns

classes : list of strings, default: None

The names of the classes in the target

color : list or tuple of colors, default: None

Specify the colors for each individual class

colormap : string or matplotlib cmap, default: None

Sequential colormap for continuous target

vlines : bool, default: True

Display the vertical azis lines

vlines_kwds : dict, default: None

Keyword arguments to draw the vlines

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Returns:

ax : matplotlib axes

Returns the axes that the parallel coordinates were drawn on.

yellowbrick.features.radviz module

Implements radviz for feature analysis.

yellowbrick.features.radviz.RadViz

alias of RadialVisualizer

class yellowbrick.features.radviz.RadialVisualizer(ax=None, features=None, classes=None, color=None, colormap=None, **kwargs)[source]

Bases: yellowbrick.features.base.DataVisualizer

RadViz is a multivariate data visualization algorithm that plots each axis uniformely around the circumference of a circle then plots points on the interior of the circle such that the point normalizes its values on the axes from the center to each arc.

Parameters:

ax : matplotlib Axes, default: None

The axis to plot the figure on. If None is passed in the current axes will be used (or generated if required).

features : list, default: None

a list of feature names to use If a DataFrame is passed to fit and features is None, feature names are selected as the columns of the DataFrame.

classes : list, default: None

a list of class names for the legend If classes is None and a y value is passed to fit then the classes are selected from the target vector.

color : list or tuple, default: None

optional list or tuple of colors to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

colormap : string or cmap, default: None

optional string or matplotlib cmap to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Notes

These parameters can be influenced later on in the visualization process, but can and should be set as early as possible.

Examples

>>> visualizer = RadViz()
>>> visualizer.fit(X, y)
>>> visualizer.transform(X)
>>> visualizer.poof()
draw(X, y, **kwargs)[source]

Called from the fit method, this method creates the radviz canvas and draws each instance as a class or target colored point, whose location is determined by the feature data set.

finalize(**kwargs)[source]

Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

Parameters:kwargs: generic keyword arguments.
static normalize(X)[source]

MinMax normalization to fit a matrix in the space [0,1] by column.

yellowbrick.features.radviz.radviz(X, y=None, ax=None, features=None, classes=None, color=None, colormap=None, **kwargs)[source]

Displays each feature as an axis around a circle surrounding a scatter plot whose points are each individual instance.

This helper function is a quick wrapper to utilize the RadialVisualizer (Transformer) for one-off analysis.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

ax : matplotlib Axes, default: None

The axes to plot the figure on.

features : list of strings, default: None

The names of the features or columns

classes : list of strings, default: None

The names of the classes in the target

color : list or tuple of colors, default: None

Specify the colors for each individual class

colormap : string or matplotlib cmap, default: None

Sequential colormap for continuous target

Returns:

ax : matplotlib axes

Returns the axes that the parallel coordinates were drawn on.

yellowbrick.features.rankd module

Implements 1D (histograms) and 2D (joint plot) feature rankings.

class yellowbrick.features.rankd.Rank2D(ax=None, algorithm=’pearson’, features=None, colormap=’RdBu_r’, **kwargs)[source]

Bases: yellowbrick.features.base.FeatureVisualizer

Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm (e.g. Pearson correlation) then returns them ranked as a lower left triangle diagram.

Parameters:

ax : matplotlib Axes, default: None

The axis to plot the figure on. If None is passed in the current axes will be used (or generated if required).

algorithm : one of {‘pearson’, ‘covariance’}, default: ‘pearson’

The ranking algorithm to use, default is Pearson correlation.

features : list

a list of feature names to use If a DataFrame is passed to fit and features is None, feature names are selected as the columns of the DataFrame.

colormap : string or cmap, default: ‘RdBu_r’

optional string or matplotlib cmap to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Notes

These parameters can be influenced later on in the visualization process, but can and should be set as early as possible.

Examples

>>> visualizer = Rank2D()
>>> visualizer.fit(X, y)
>>> visualizer.transform(X)
>>> visualizer.poof()
draw(X, **kwargs)[source]

Draws the heatmap of the ranking matrix of variables.

finalize(**kwargs)[source]

Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

Parameters:

kwargs: dict

generic keyword arguments

fit(X, y=None, **kwargs)[source]

The fit method gathers information about the state of the visualizer.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

kwargs : dict

Pass generic arguments to the drawing method

Returns:

self : instance

Returns the instance of the transformer/visualizer

rank(X, algorithm=None)[source]

Returns the ranking of each pair of columns as an m by m matrix.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

algorithm : str or None

The ranking mechanism to use, or None for the default

Returns:

R : ndarray

The mxm ranking matrix of the variables

ranking_methods = {‘pearson’: <function <lambda>>, ‘covariance’: <function <lambda>>}
transform(X, **kwargs)[source]

The transform method is the primary drawing hook for ranking classes.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

kwargs : dict

Pass generic arguments to the drawing method

Returns:

Xp : ndarray

The transformed matrix, X’

yellowbrick.features.rankd.rank2d(X, y=None, ax=None, algorithm=’pearson’, features=None, colormap=’RdBu_r’, **kwargs)[source]

Displays pairwise comparisons of features with the algorithm and ranks them in a lower-left triangle heatmap plot.

This helper function is a quick wrapper to utilize the Rank2D Visualizer (Transformer) for one-off analysis.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values

ax : matplotlib axes

the axis to plot the figure on.

algorithm : one of {pearson, covariance}

the ranking algorithm to use, default is Pearson correlation.

features : list

a list of feature names to use If a DataFrame is passed to fit and features is None, feature names are selected as the columns of the DataFrame.

colormap : string or cmap

optional string or matplotlib cmap to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

Returns:

ax : matplotlib axes

Returns the axes that the parallel coordinates were drawn on.

yellowbrick.features.jointplot module

class yellowbrick.features.jointplot.JointPlotVisualizer(ax=None, feature=None, target=None, joint_plot=’scatter’, joint_args=None, xy_plot=’hist’, xy_args=None, size=6, ratio=5, space=0.2, **kwargs)[source]

Bases: yellowbrick.features.base.FeatureVisualizer

JointPlotVisualizer allows for a simultaneous visualization of the relationship between two variables and the distrbution of each individual variable. The relationship is plotted along the joint axis and univariate distributions are plotted on top of the x axis and to the right of the y axis.

Parameters:

ax: matplotlib Axes, default: None

This is inherited from FeatureVisualizer but is defined within JointPlotVisualizer since there are three axes objects.

feature: string, default: None

The name of the X variable If a DataFrame is passed to fit and feature is None, feature is selected as the column of the DataFrame. There must be only one column in the DataFrame.

target: string, default: None

The name of the Y variable If target is None and a y value is passed to fit then the target is selected from the target vector.

joint_plot: one of {‘scatter’, ‘hex’}, default: ‘scatter’

The type of plot to render in the joint axis Currently, the choices are scatter and hex. Use scatter for small datasets and hex for large datasets

joint_args: dict, default: None

Keyword arguments used for customizing the joint plot:

Property Description
alpha transparency
facecolor background color of the joint axis
aspect aspect ratio
fit used if scatter is selected for joint_plot to draw a best fit line - values can be True or False. Uses Yellowbrick.bestfit
estimator used if scatter is selected for joint_plot to determine the type of best fit line to use. Refer to Yellowbrick.bestfit for types of estimators that can be used.
x_bins used if hex is selected to set the number of bins for the x value
y_bins used if hex is selected to set the number of bins for the y value
cmap string or matplotlib cmap to colorize lines Use either color to colorize the lines on a per class basis or colormap to color them on a continuous scale.

xy_plot: one of {‘hist’}, default: ‘hist’

The type of plot to render along the x and y axes Currently, the choice is hist

xy_args: dict, default: None

Keyword arguments used for customizing the x and y plots:

Property Description
alpha transparency
facecolor_x background color of the x axis
facecolor_y background color of the y axis
bins used to set up the number of bins for the hist plot
histcolor_x used to set the color for the histogram on the x axis
histcolor_y used to set the color for the histogram on the y axis

size: float, default: 6

Size of each side of the figure in inches

ratio: float, default: 5

Ratio of joint axis size to the x and y axes height

space: float, default: 0.2

Space between the joint axis and the x and y axes

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Notes

These parameters can be influenced later on in the visualization process, but can and should be set as early as possible.

Examples

>>> visualizer = JointPlotVisualizer()
>>> visualizer.fit(X,y)
>>> visualizer.poof()
draw(X, y, **kwargs)[source]

Sets up the layout for the joint plot draw calls draw_joint and draw_xy to render the visualizations.

draw_joint(X, y, **kwargs)[source]

Draws the visualization for the joint axis.

draw_xy(X, y, **kwargs)[source]

Draws the visualization for the x and y axes

finalize(**kwargs)[source]

Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

Parameters:kwargs: generic keyword arguments.
fit(X, y, **kwargs)[source]

Sets up the X and y variables for the jointplot and checks to ensure that X and y are of the correct data type

Fit calls draw

Parameters:

X : ndarray or DataFrame of shape n x 1

A matrix of n instances with 1 feature

y : ndarray or Series of length n

An array or series of the target value

kwargs: dict

keyword arguments passed to Scikit-Learn API.

poof(**kwargs)[source]

Creates the labels for the feature and target variables