yellowbrick.regressor package

Submodules

yellowbrick.regressor.base module

Base classes for regressor Visualizers.

class yellowbrick.regressor.base.RegressionScoreVisualizer(model, ax=None, **kwargs)[source]

Bases: yellowbrick.base.ScoreVisualizer

Base class for all ScoreVisualizers that evaluate a regression estimator.

The primary functionality of this class is to perform a check to ensure the passed in estimator is a regressor, otherwise it raises a YellowbrickTypeError.

yellowbrick.regressor.residuals module

Regressor visualizers that score residuals: prediction vs. actual data.

class yellowbrick.regressor.residuals.PredictionError(model, ax=None, **kwargs)[source]

Bases: yellowbrick.regressor.base.RegressionScoreVisualizer

The prediction error visualizer plots the actual targets from the dataset against the predicted values generated by our model(s). This visualizer is used to dectect noise or heteroscedasticity along a range of the target domain.

Parameters:

model : a Scikit-Learn regressor

Should be an instance of a regressor, otherwise a will raise a YellowbrickTypeError exception on instantiation.

ax : matplotlib Axes, default: None

The axes to plot the figure on. If None is passed in the current axes will be used (or generated if required).

point_color : color

Defines the color of the error points; can be any matplotlib color.

line_color : color

Defines the color of the best fit line; can be any matplotlib color.

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Notes

PredictionError is a ScoreVisualizer, meaning that it wraps a model and its primary entry point is the score() method.

Examples

>>> from yellowbrick.regressor import PredictionError
>>> from sklearn.linear_model import Lasso
>>> model = PredictionError(Lasso())
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> model.poof()
draw(y, y_pred)[source]
Parameters:

y : ndarray or Series of length n

An array or series of target or class values

y_pred : ndarray or Series of length n

An array or series of predicted target values

Returns

——

ax : the axis with the plotted figure

finalize(**kwargs)[source]

Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

Parameters:kwargs: generic keyword arguments.
score(X, y=None, **kwargs)[source]

The score function is the hook for visual interaction. Pass in test data and the visualizer will create predictions on the data and evaluate them with respect to the test values. The evaluation will then be passed to draw() and the result of the estimator score will be returned.

Parameters:

X : array-like

X (also X_test) are the dependent variables of test set to predict

y : array-like

y (also y_test) is the independent actual variables to score against

Returns:

score : float

yellowbrick.regressor.residuals.prediction_error(model, X, y=None, ax=None, **kwargs)[source]

Quick method:

Plot the actual targets from the dataset against the predicted values generated by our model(s).

This helper function is a quick wrapper to utilize the PredictionError ScoreVisualizer for one-off analysis.

Parameters:

model : the Scikit-Learn estimator (should be a regressor)

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features.

y : ndarray or Series of length n

An array or series of target or class values.

ax : matplotlib Axes

The axes to plot the figure on.

Returns:

ax : matplotlib Axes

Returns the axes that the prediction error plot was drawn on.

class yellowbrick.regressor.residuals.ResidualsPlot(model, ax=None, **kwargs)[source]

Bases: yellowbrick.regressor.base.RegressionScoreVisualizer

A residual plot shows the residuals on the vertical axis and the independent variable on the horizontal axis.

If the points are randomly dispersed around the horizontal axis, a linear regression model is appropriate for the data; otherwise, a non-linear model is more appropriate.

Parameters:

model : a Scikit-Learn regressor

Should be an instance of a regressor, otherwise a will raise a YellowbrickTypeError exception on instantiation.

ax : matplotlib Axes, default: None

The axes to plot the figure on. If None is passed in the current axes will be used (or generated if required).

train_color : color, default: ‘b’

Residuals for training data are ploted with this color but also given an opacity of 0.5 to ensure that the test data residuals are more visible. Can be any matplotlib color.

test_color : color, default: ‘g’

Residuals for test data are plotted with this color. In order to create generalizable models, reserved test data residuals are of the most analytical interest, so these points are highlighted by hvaing full opacity. Can be any matplotlib color.

line_color : color, default: dark grey

Defines the color of the zero error line, can be any matplotlib color.

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Notes

ResidualsPlot is a ScoreVisualizer, meaning that it wraps a model and its primary entry point is the score() method.

Examples

>>> from yellowbrick.regressor import ResidualsPlot
>>> from sklearn.linear_model import Ridge
>>> model = PredictionError(Ridge())
>>> model.fit(X_train, y_train)
>>> model.score(X_test, y_test)
>>> model.poof()
draw(y_pred, residuals, train=False, **kwargs)[source]
Parameters:

y_pred : ndarray or Series of length n

An array or series of predicted target values

residuals : ndarray or Series of length n

An array or series of the difference between the predicted and the target values

train : boolean

If False, draw assumes that the residual points being plotted are from the test data; if True, draw assumes the residuals are the train data.

Returns

——

ax : the axis with the plotted figure

finalize(**kwargs)[source]

Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

Parameters:kwargs: generic keyword arguments.
fit(X, y=None, **kwargs)[source]
Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target values

kwargs: keyword arguments passed to Scikit-Learn API.

score(X, y=None, train=False, **kwargs)[source]

Generates predicted target values using the Scikit-Learn estimator.

Parameters:

X : array-like

X (also X_test) are the dependent variables of test set to predict

y : array-like

y (also y_test) is the independent actual variables to score against

train : boolean

If False, score assumes that the residual points being plotted are from the test data; if True, score assumes the residuals are the train data.

Returns

——

ax : the axis with the plotted figure

yellowbrick.regressor.residuals.residuals_plot(model, X, y=None, ax=None, **kwargs)[source]

Quick method:

Plot the residuals on the vertical axis and the independent variable on the horizontal axis.

This helper function is a quick wrapper to utilize the ResidualsPlot ScoreVisualizer for one-off analysis.

Parameters:

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features.

y : ndarray or Series of length n

An array or series of target or class values.

ax : matplotlib axes

The axes to plot the figure on.

model : the Scikit-Learn estimator (should be a regressor)

Returns:

ax : matplotlib axes

Returns the axes that the residuals plot was drawn on.

yellowbrick.regressor.alphas module

Implements alpha selection visualizers for regularization

class yellowbrick.regressor.alphas.AlphaSelection(model, ax=None, **kwargs)[source]

Bases: yellowbrick.regressor.base.RegressionScoreVisualizer

The Alpha Selection Visualizer demonstrates how different values of alpha influence model selection during the regularization of linear models. Generally speaking, alpha increases the affect of regularization, e.g. if alpha is zero there is no regularization and the higher the alpha, the more the regularization parameter influences the final model.

Regularization is designed to penalize model complexity, therefore the higher the alpha, the less complex the model, decreasing the error due to variance (overfit). Alphas that are too high on the other hand increase the error due to bias (underfit). It is important, therefore to choose an optimal Alpha such that the error is minimized in both directions.

To do this, typically you would you use one of the “RegressionCV” models in Scikit-Learn. E.g. instead of using the Ridge (L2) regularizer, you can use RidgeCV and pass a list of alphas, which will be selected based on the cross-validation score of each alpha. This visualizer wraps a “RegressionCV” model and visualizes the alpha/error curve. Use this visualization to detect if the model is responding to regularization, e.g. as you increase or decrease alpha, the model responds and error is decreased. If the visualization shows a jagged or random plot, then potentially the model is not sensitive to that type of regularization and another is required (e.g. L1 or Lasso regularization).

Parameters:

model : a Scikit-Learn regressor

Should be an instance of a regressor, and specifically one whose name ends with “CV” otherwise a will raise a YellowbrickTypeError exception on instantiation. To use non-CV regressors see: ManualAlphaSelection.

ax : matplotlib Axes, default: None

The axes to plot the figure on. If None is passed in the current axes will be used (or generated if required).

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Notes

This class expects an estimator whose name ends with “CV”. If you wish to use some other estimator, please see the ManualAlphaSelection Visualizer for manually iterating through all alphas and selecting the best one.

This Visualizer hoooks into the Scikit-Learn API during fit(). In order to pass a fitted model to the Visualizer, call the draw() method directly after instantiating the visualizer with the fitted model.

Note, each “RegressorCV” module has many different methods for storing alphas and error. This visualizer attempts to get them all and is known to work for RidgeCV, LassoCV, LassoLarsCV, and ElasticNetCV. If your favorite regularization method doesn’t work, please submit a bug report.

For RidgeCV, make sure store_cv_values=True.

Examples

>>> from yellowbrick.regressor import AlphaSelection
>>> from sklearn.linear_model import LassoCV
>>> model = AlphaSelection(LassoCV())
>>> model.fit(X, y)
>>> model.poof()
draw()[source]

Draws the alpha plot based on the values on the estimator.

finalize()[source]

Prepare the figure for rendering by setting the title as well as the X and Y axis labels and adding the legend.

fit(X, y, **kwargs)[source]

A simple pass-through method; calls fit on the estimator and then draws the alpha-error plot.

class yellowbrick.regressor.alphas.ManualAlphaSelection(model, ax=None, alphas=None, cv=None, scoring=None, **kwargs)[source]

Bases: yellowbrick.regressor.alphas.AlphaSelection

The AlphaSelection visualizer requires a “RegressorCV”, that is a specialized class that performs cross-validated alpha-selection on behalf of the model. If the regressor you wish to use doesn’t have an associated “CV” estimator, or for some reason you would like to specify more control over the alpha selection process, then you can use this manual alpha selection visualizer, which is essentially a wrapper for cross_val_score, fitting a model for each alpha specified.

Parameters:

model : a Scikit-Learn regressor

Should be an instance of a regressor, and specifically one whose name doesn’t end with “CV”. The regressor must support a call to set_params(alpha=alpha) and be fit multiple times. If the regressor name ends with “CV” a YellowbrickValueError is raised.

ax : matplotlib Axes, default: None

The axes to plot the figure on. If None is passed in the current axes will be used (or generated if required).

alphas : ndarray or Series, default: np.logspace(-10, 2, 200)

An array of alphas to fit each model with

cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 3-fold cross validation,
  • integer, to specify the number of folds in a (Stratified)KFold,
  • An object to be used as a cross-validation generator.
  • An iterable yielding train, test splits.

This argument is passed to the sklearn.model_selection.cross_val_score method to produce the cross validated score for each alpha.

scoring : string, callable or None, optional, default: None

A string (see model evaluation documentation) or a scorer callable object / function with signature scorer(estimator, X, y).

This argument is passed to the sklearn.model_selection.cross_val_score method to produce the cross validated score for each alpha.

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Notes

This class does not take advantage of estimator-specific searching and is therefore less optimal and more time consuming than the regular “RegressorCV” estimators.

Examples

>>> from yellowbrick.regressor import ManualAlphaSelection
>>> from sklearn.linear_model import Ridge
>>> model = ManualAlphaSelection(
...     Ridge(), cv=12, scoring='neg_mean_squared_error'
... )
...
>>> model.fit(X, y)
>>> model.poof()
draw()[source]

Draws the alphas values against their associated error in a similar fashion to the AlphaSelection visualizer.

fit(X, y, **args)[source]

The fit method is the primary entry point for the manual alpha selection visualizer. It sets the alpha param for each alpha in the alphas list on the wrapped estimator, then scores the model using the passed in X and y data set. Those scores are then aggregated and drawn using matplotlib.