PrePredict Estimators

Occassionally it is useful to be able to use predictions made during an inferencing workflow that does not involve Yellowbrick, for example when the inferencing process requires extra compute resources such as a cluster or when the model takes a very long time to train and inference. In other instances there are models that Yellowbrick simply does not support, even with the third-party estimator wrapper or the results may have been collected from some source out of your control.

Some Yellowbrick visualizers are still able to create visual diagnostics with predictions already made using the contrib library PrePredict estimator, which is a simple wrapper around some data and an estimator type. Although not quite as straight forward as a scikit-learn metric in the form metric(y_true, y_pred), this estimator allows Yellowbrick to be used in the cases described above, an example is below:

# Import the prepredict estimator and a Yellowbrick visualizer
from yellowbrick.contrib.prepredict import PrePredict, CLASSIFIER
from yellowbrick.classifier import classification_report

# Instantiate the estimator with the pre-predicted data
model = PrePredict(y_pred, CLASSIFIER)

# Use the visualizer, setting X to None since it is not required
oz = classification_report(model, None, y_test)
oz.show()

Warning

Many Yellowbrick visualizers inspect the estimator for learned attributes in order to deliver rich diagnostics. You may run into visualizers that cannot use the prepredict method, or you can manually set attributes on the PrePredict estimator with the learned attributes the visualizer requires.

In the case where you’ve saved pre-predicted data from disk, the PrePredict estimator can load it using np.load. A full workflow is described below:

# Phase one: fit your estimator, make inferences, and save the inferences to disk
np.save("y_pred.npy", y_pred)

# Import the prepredict estimator and a Yellowbrick visualizer
from yellowbrick.contrib.prepredict import PrePredict, REGRESSOR
from yellowbrick.regressor import prediction_error

# Instantiate the estimator with the pre-predicted data and pass a path to where
# the data has been saved on disk.
model = PrePredict("y_pred.npy", REGRESSOR)

# Use the visualizer, setting X to None since it is not required
oz = prediction_error(model, X_test, y_test)
oz.show()

The PrePredict estimator can use a callable function to return pre-predicted data, a str, file-like object, or pathlib.Path to load from disk using np.load, otherwise it simply returns the data it wraps. See the API reference for more details.

API Reference

PrePredict estimator allows Yellowbrick to work with results produced by an estimator prior to the visual diagnostic workflow, particularly for inferences that require extensive time or compute resources.

class yellowbrick.contrib.prepredict.PrePredict(data, estimator_type=None)[source]

Bases: BaseEstimator

The Passthrough estimator allows users to specify pre-predicted results to Yellowbrick without the need to input the original estimator. Note that Yellowbrick often uses the learned attributes of the estimator to produce rich visual diagnostics, so this estimator may not work for all Yellowbrick visualizers.

The passthrough estimator can accept data either in memory as a numpy array or it can accept a string, which it interprets as a path on disk to load the data from.

Currently passthrough does not support predict_proba or decision_function methods, which it could if it was passed predicted data as 2D array instead of a 1D array.

Parameters

dataarray-like, func, or file-like object, string, or pathlib.Path: The predicted values wrapped by the estimator and returned on predict() and used by the score function. The default expectation is that data is a 1D numpy array of y_hat or y_pred values produced by some other estimator. Data can also be a func, which is called and returned, or a file-like object, string, or pathlib.Path at which point the data is loaded from disk using np.load.
estimator_typestr, optional: One of “classifier”, “regressor”, “clusterer”, “DensityEstimator”, or “outlier_detector” that allows the contrib estimator to pass the scikit-learn is_classifier, etc. functions. If not specified, the Yellowbrick visualizer you’re trying to use may error.

fit(X, y=None)[source]: Fit is a no-op, simply returning self per the scikit-learn API.

predict(X)[source]: Predict returns the embedded data but does not perform any checks on the validity of X (e.g. that it has the same shape as the internal data).

score(X, y=None)[source]: Score uses an appropriate metric for the estimator type and compares the input y values with the pre-predicted values.