Classification Report

The classification report visualizer displays the precision, recall, F1, and support scores for the model. In order to support easier interpretation and problem detection, the report integrates numerical scores with a color-coded heatmap. All heatmaps are in the range (0.0, 1.0) to facilitate easy comparison of classification models across different classification reports.

from sklearn.model_selection import train_test_split

# Load the classification data set
data = load_data("occupancy")

# Specify the features of interest and the classes of the target
features = [
    "temperature", "relative humidity", "light", "C02", "humidity"
classes = ["unoccupied", "occupied"]

# Extract the numpy arrays from the data frame
X = data[features].as_matrix()
y = data.occupancy.as_matrix()

# Create the train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
from sklearn.naive_bayes import GaussianNB
from yellowbrick.classifier import ClassificationReport

# Instantiate the classification model and visualizer
bayes = GaussianNB()
visualizer = ClassificationReport(bayes, classes=classes, support=True), y_train)  # Fit the visualizer and the model
visualizer.score(X_test, y_test)  # Evaluate the model on the test data
g = visualizer.poof()             # Draw/show/poof the data

The classification report shows a representation of the main classification metrics on a per-class basis. This gives a deeper intuition of the classifier behavior over global accuracy which can mask functional weaknesses in one class of a multiclass problem. Visual classification reports are used to compare classification models to select models that are “redder”, e.g. have stronger classification metrics or that are more balanced.

The metrics are defined in terms of true and false positives, and true and false negatives. Positive and negative in this case are generic names for the classes of a binary classification problem. In the example above, we would consider true and false occupied and true and false unoccupied. Therefore a true positive is when the actual class is positive as is the estimated class. A false positive is when the actual class is negative but the estimated class is positive. Using this terminology the meterics are defined as follows:

Precision is the ability of a classiifer not to label an instance positive that is actually negative. For each class it is defined as as the ratio of true positives to the sum of true and false positives. Said another way, “for all instances classified positive, what percent was correct?”
Recall is the ability of a classifier to find all positive instances. For each class it is defined as the ratio of true positives to the sum of true positives and false negatives. Said another way, “for all instances that were actually positive, what percent was classified correctly?”
f1 score
The F1 score is a weighted harmonic mean of precision and recall such that the best score is 1.0 and the worst is 0.0. Generally speaking, F1 scores are lower than accuracy measures as they embed precision and recall into their computation. As a rule of thumb, the weighted average of F1 should be used to compare classifier models, not global accuracy.
Support is the number of actual occurrences of the class in the specified dataset. Imbalanced support in the training data may indicate structural weaknesses in the reported scores of the classifier and could indicate the need for stratified sampling or rebalancing. Support doesn’t change between models but instead diagnoses the evaluation process.

API Reference

Visual classification report for classifier scoring.

class yellowbrick.classifier.classification_report.ClassificationReport(model, ax=None, classes=None, cmap='YlOrRd', support=None, **kwargs)[source]

Bases: yellowbrick.classifier.base.ClassificationScoreVisualizer

Classification report that shows the precision, recall, F1, and support scores for the model. Integrates numerical scores as well as a color-coded heatmap.

ax : The axis to plot the figure on.
model : the Scikit-Learn estimator

Should be an instance of a classifier, else the __init__ will return an error.

classes : a list of class names for the legend

If classes is None and a y value is passed to fit then the classes are selected from the target vector.

cmap : string, default: 'YlOrRd'

Specify a colormap to define the heatmap of the predicted class against the actual class in the confusion matrix.

support: {True, False, None, ‘percent’, ‘count’}, default: None

Specify if support will be displayed. It can be further defined by whether support should be reported as a raw count or percentage.

kwargs : keyword arguments passed to the super class.


>>> from yellowbrick.classifier import ClassificationReport
>>> from sklearn.linear_model import LogisticRegression
>>> viz = ClassificationReport(LogisticRegression())
>>>, y_train)
>>> viz.score(X_test, y_test)
>>> viz.poof()
scores_ : dict of dicts

Outer dictionary composed of precision, recall, f1, and support scores with inner dictionaries specifiying the values for each class listed.


Renders the classification report across each axis.


Finalize executes any subclass-specific axes finalization steps. The user calls poof and poof calls finalize.

kwargs: generic keyword arguments.
score(X, y=None, **kwargs)[source]

Generates the Scikit-Learn classification report.

X : ndarray or DataFrame of shape n x m

A matrix of n instances with m features

y : ndarray or Series of length n

An array or series of target or class values