Yellowbrick: Machine Learning Visualization¶
Yellowbrick is a suite of visual diagnostic tools called “Visualizers” that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines scikit-learn with matplotlib in the best tradition of the scikit-learn documentation, but to produce visualizations for your models! For more on Yellowbrick, please see the About.
If you’re new to Yellowbrick, checkout the Quick Start or skip ahead to the Model Selection Tutorial. Yellowbrick is a rich library with many Visualizers being added on a regular basis. For details on specific Visualizers and extended usage head over to the Visualizers and API. Interested in contributing to Yellowbrick? Checkout the contributing guide . If you’ve signed up to do user testing, head over to the User Testing Instructions (and thank you!).
Visualizers are estimators (objects that learn from data) whose primary objective is to create visualizations that allow insight into the model selection process. In Scikit-Learn terms, they can be similar to transformers when visualizing the data space or wrap an model estimator similar to how the “ModelCV” (e.g. RidgeCV, LassoCV) methods work. The primary goal of Yellowbrick is to create a sensical API similar to Scikit-Learn. Some of our most popular visualizers include:
- Rank Features: pairwise ranking of features to detect relationships
- Parallel Coordinates: horizontal visualization of instances
- Radial Visualization: separation of instances around a circular plot
- PCA Projection: projection of instances based on principal components
- Manifold Visualization: high dimensional visualization with manifold learning
- Feature Importances: rank features by importance or linear coefficients for a specific model
- Recursive Feature Elimination: find the best subset of features based on importance
- Joint Plots: direct data visualization with feature selection
- Class Balance: see how the distribution of classes affects the model
- Class Prediction Error: shows error and support in classification
- Classification Report: visual representation of precision, recall, and F1
- ROC/AUC Curves: receiver operator characteristics and area under the curve
- Confusion Matrices: visual description of class decision making
- Discrimination Threshold: find a threshold that best separates binary classes
Model Selection Visualization¶
- Term Frequency: visualize the frequency distribution of terms in the corpus
- t-SNE Corpus Visualization: use stochastic neighbor embedding to project documents
- Dispersion Plot: visualize how key terms are dispersed throughout a corpus
… and more! Visualizers are being added all the time; be sure to check the examples (or even the develop branch) and feel free to contribute your ideas for new Visualizers!
Yellowbrick is a welcoming, inclusive project in the tradition of matplotlib and scikit-learn. Similar to those projects, we follow the Python Software Foundation Code of Conduct. Please don’t hesitate to reach out to us for help or if you have any contributions or bugs to report!
The primary way to ask for help with Yellowbrick is to post on our Google Groups Listserv. This is an email list/forum that members of the community can join and respond to each other; you should be able to receive the quickest response here. Please also consider joining the group so you can respond to questions! You can also ask questions on Stack Overflow and tag them with “yellowbrick”. Or you can add issues on GitHub. You can also tweet or direct message us on Twitter @scikit_yb.
Table of Contents¶
The following is a complete listing of the Yellowbrick documentation for this version of the library:
- Quick Start
- Model Selection Tutorial
- Visualizers and API
- User Testing Instructions
- Effective Matplotlib
- Code of Conduct