statsmodels is a Python library that provides utilities for the estimation of several statistical models and includes extensive results and metrics for each estimator. In particular, statsmodels excels at generalized linear models (GLMs) which are far superior to scikit-learn’s implementation of ordinary least squares.
This contrib module allows statsmodels users to take advantage of Yellowbrick visualizers by creating a wrapper class that implements the scikit-learn
BaseEstimator. Using the wrapper class, statsmodels can be passed directly to many visualizers, customized for the scoring and metric functionality required.
The statsmodel wrapper is currently a prototype and as such is currently a bit trivial. Many options and extra functionality such as weights are not currently handled. We are actively looking for statsmodels users to contribute to this package!
Using the statsmodels wrapper:
import statsmodels.api as sm from functools import partial from yellowbrick.regressor import ResidualsPlot from yellowbrick.contrib.statsmodels import StatsModelsWrapper glm_gaussian_partial = partial(sm.GLM, family=sm.families.Gaussian()) model = StatsModelsWrapper(glm_gaussian_partial) viz = ResidualsPlot(model) viz.fit(X_train, y_train) viz.score(X_test, y_test) viz.show()
You can also use fitted estimators with the wrapper to avoid having to pass a partial function:
from yellowbrick.regressor import prediction_error # Create the OLS model model = sm.OLS(y, X) # Get the detailed results results = model.fit() print(results.summary()) # Visualize the prediction error prediction_error(StatsModelWrapper(model), X, y, is_fitted=True)
This example also shows the use of a Yellowbrick oneliner, which is often more suited to the analytical style of statsmodels.
A basic wrapper for statsmodels that emulates a scikit-learn estimator.
StatsModelsWrapper(glm_partial, stated_estimator_type='regressor', scorer=<function r2_score>)¶
Wrap a statsmodels GLM as a sklearn (fake) BaseEstimator for YellowBrick.
This wrapper is trivial, options and extra things like weights are not currently handled.
First import the external libraries and helper utilities:
>>> import statsmodels.api as sm >>> from functools import partial
Instantiate a partial with the statsmodels API:
>>> glm_gaussian_partial = partial(sm.GLM, family=sm.families.Gaussian()) >>> sm_est = StatsModelsWrapper(glm_gaussian_partial)
Create a Yellowbrick visualizer to visualize prediction error:
>>> visualizer = PredictionError(sm_est) >>> visualizer.fit(X_train, y_train) >>> visualizer.score(X_test, y_test)
For statsmodels usage, calling .summary() etc:
>>> gaussian_model = glm_gaussian_partial(y_train, X_train)
Pretend to be a sklearn estimator, fit is called on creation