Figures and Axes
This document is an open letter to the PyData community, particularly those that are involved in matplotlib development. We’d like to get some advice on the API choice we’ve made and thoughts about our use of the matplotlib Axes objects.
One of the most complex parts of designing a visualization library around matplotlib is working with figures and axes. As defined in The Lifecycle of a Plot, these central objects of matplotlib plots are as follows:
A Figure is the final image that may contain 1 or more Axes.
An Axes represents an individual plot
Based on these definitions and and the advice to “try to use the object-oriented interface over the pyplot interface”, the Yellowbrick interface is designed to wrap a matplotlib
axes.Axes. We propose the following general use case for most visualizers:
import matplotlib.pyplot as plt from yellowbrick import Visualizer, quick_visualizer fig, ax = plt.subplots() # Object oriented approach viz = Visualizer(ax=ax) viz.fit(X, y) viz.show() # Quick method approach viz = quick_visualizer(X, y, ax=ax) viz.show()
This design allows users to more directly control the size, style, and interaction with the plot (though YB does provide some helpers for these as well). For example, if a user wanted to generate a report with multiple visualizers for a classification problem, it may looks something like:
import matplotlib.pyplot as plt from yellowbrick.features import FeatureImportances from yellowbrick.classifier import ConfusionMatrix, ClassificationReport, ROCAUC from sklearn.linear_model import LogisticRegression fig, axes = plot.subplots(2, 2) model = LogisticRegression() visualgrid = [ FeatureImportances(ax=axes), ConfusionMatrix(model, ax=axes), ClassificationReport(model, ax=axes), ROCAUC(model, ax=axes), ] for viz in visualgrid: viz.fit(X_train, y_train) viz.score(X_test, y_test) viz.finalize() plt.show()
This is a common use case and we’re working on the idea of “visual pipelines” to support this type of development because, for machine learning, users generally want a suite of visualizers or a report, not just a single visualization. The API requirement to support this has therefore been that visualizers use the
ax object passed to them and not
plt. If the user does not pass a specific
ax then the global current axes is used via
plt.gca. Generally, visualizers should behave as though they are a plot that as part of a larger figure.
Visualizers are getting more complex, however, and some are becoming multi-axes plots in their own right. For example:
The ResidualsPlot has a scatter plot axes and a histogram axes
The JointPlot has a scatter plot and two histogram axes
Data driven scatter plot axes often have colorbar axes
The PCA plot has scatter plot, color bar, and heatmap axes
The confusion matrix probability histogram is a grid of axes for each class pair
The ICDM has an inset axes that acts as a dynamic legend
Although it would have been easier to simply embed the figure into the visualizer and use a
GridSpec or other layout tool, the focus on ensuring visualizers are individual plots that wrap an Axes has made us bend over backward to adjust the plot inside of the axes area that was originally supplied, primarily by using
make_axes_locateable, which is part of the AxesGrid toolkit.
Generally, it appears that the AxesGrid Toolkit is the right tool for Yellowbrick - many of the examples shown are similar to the things that Yellowbrick is trying to do. However, this package is not fully documented with examples and some helper utilities that would be useful, for example the
ImageGrid, still require a
At this point we are left with some important questions about Yellowbrick’s development roadmap:
Like Seaborn, should YB have two classes of visualizer, one that wraps an axes and one that wraps a figure?
Should we go all in on the AxesGrid toolkit and continue to restrict our use of the figure, will this method be supported in the long run?
Other notes and discussion: