Yellowbrick: Machine Learning Visualization¶
Yellowbrick extends the Scikit-Learn API to make model selection and hyperparameter tuning easier. Under the hood, it’s using Matplotlib.
Recommended Learning Path¶
Interested in contributing to Yellowbrick? Yellowbrick is a welcoming, inclusive project and we would love to have you. We follow the Python Software Foundation Code of Conduct.
No matter your level of technical skill, you can be helpful. We appreciate bug reports, user testing, feature requests, bug fixes, product enhancements, and documentation improvements.
Check out the Contributing guide!
If you’ve signed up to do user testing, head over to the User Testing Instructions.
Please consider joining the Google Groups Listserv listserve so you can respond to questions.
Thank you for your contributions!
Concepts & API¶
The primary goal of Yellowbrick is to create a sensical API similar to Scikit-Learn.
Visualizers are the core objects in Yellowbrick. They are similar to transformers in Scikit-Learn. Visualizers can wrap a model estimator - similar to how the “ModelCV” (e.g. RidgeCV, LassoCV) methods work.
Some of our most popular visualizers include:
- Rank Features: pairwise ranking of features to detect relationships
- Parallel Coordinates: horizontal visualization of instances
- Radial Visualization: separation of instances around a circular plot
- PCA Projection: projection of instances based on principal components
- Manifold Visualization: high dimensional visualization with manifold learning
- Feature Importances: rank features by importance or linear coefficients for a specific model
- Recursive Feature Elimination: find the best subset of features based on importance
- Joint Plots: direct data visualization with feature selection
- Class Prediction Error: shows error and support in classification
- Classification Report: visual representation of precision, recall, and F1
- ROC/AUC Curves: receiver operator characteristics and area under the curve
- Precision-Recall Curves: precision vs recall for different probability thresholds
- Confusion Matrices: visual description of class decision making
- Discrimination Threshold: find a threshold that best separates binary classes
Model Selection Visualization¶
- Balanced Binning Reference: generate a histogram with vertical lines showing the recommended value point to bin the data into evenly distributed bins
- Class Balance: see how the distribution of classes affects the model
- Feature Correlation: display the correlation between features and dependent variables
- Term Frequency: visualize the frequency distribution of terms in the corpus
- t-SNE Corpus Visualization: use stochastic neighbor embedding to project documents
- Dispersion Plot: visualize how key terms are dispersed throughout a corpus
- UMAP Corpus Visualization: plot similar documents closer together to discover clusters
- PosTag Visualization: plot the counts of different parts-of-speech throughout a tagged corpus
… and more! Visualizers are being added all the time. Check the examples (or even the develop branch). Feel free to contribute your ideas for new Visualizers!
Can’t get someting to work? Here are places you can find help.
Table of Contents¶
- Quick Start
- Model Selection Tutorial
- Visualizers and API
- User Testing Instructions
- Effective Matplotlib
- Yellowbrick for Teachers
- Frequently Asked Questions
- Code of Conduct