Cross Validation Scores

Generally we determine whether a given model is optimal by looking at it’s F1, precision, recall, and accuracy (for classification), or it’s coefficient of determination (R2) and error (for regression). However, real world data is often distributed somewhat unevenly, meaning that the fitted model is likely to perform better on some sections of the data than on others. Yellowbrick’s CVScores visualizer enables us to visually explore these variations in performance using different cross validation strategies.

Cross Validation

Cross-validation starts by shuffling the data (to prevent any unintentional ordering errors) and splitting it into k folds. Then k models are fit on \(\frac{k-1} {k}\) of the data (called the training split) and evaluated on \(\frac {1} {k}\) of the data (called the test split). The results from each evaluation are averaged together for a final score, then the final model is fit on the entire dataset for operationalization.


In Yellowbrick, the CVScores visualizer displays cross-validated scores as a bar chart (one bar for each fold) with the average score across all folds plotted as a horizontal dotted line.


In the following example we show how to visualize cross-validated scores for a classification model. After loading a DataFrame, we create a StratifiedKFold cross-validation strategy to ensure all of our classes in each split are represented with the same proportion. We then fit the CVScores visualizer using the f1_weighted scoring metric as opposed to the default metric, accuracy, to get a better sense of the relationship of precision and recall in our classifier across all of our folds.

import pandas as pd
import matplotlib.pyplot as plt

from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import StratifiedKFold

from yellowbrick.model_selection import CVScores

# Load the classification data set
data = load_data("occupancy")

# Specify the features of interest
features = ["temperature", "relative humidity", "light", "C02", "humidity"]

# Extract the instances and target
X = data[features]
y = data.occupancy

# Create a new figure and axes
_, ax = plt.subplots()

# Create a cross-validation strategy
cv = StratifiedKFold(12)

# Create the cv score visualizer
oz = CVScores(
    MultinomialNB(), ax=ax, cv=cv, scoring='f1_weighted'
), y)

Our resulting visualization shows that while our average cross-validation score is quite high, there are some splits for which our fitted MultinomialNB classifier performs significantly less well.



In this next example we show how to visualize cross-validated scores for a regression model. After loading our energy data into a DataFrame, we instantiate a simple KFold cross-validation strategy. We then fit the CVScores visualizer using the r2 scoring metric, to get a sense of the coefficient of determination for our regressor across all of our folds.

from sklearn.linear_model import Ridge
from sklearn.model_selection import KFold

# Load the regression data set
data = load_data("energy")

# Specify the features of interest and the target
targets = ["heating load", "cooling load"]
features = [col for col in data.columns if col not in targets]

# Extract the instances and target
X = data[features]
y = data[targets[1]]

# Create a new figure and axes
_, ax = plt.subplots()

cv = KFold(12)

oz = CVScores(
    Ridge(), ax=ax, cv=cv, scoring='r2'
), y)

As with our classification CVScores visualization, our regression visualization suggests that our Ridge regressor performs very well (e.g. produces a high coefficient of determination) across nearly every fold, resulting in another fairly high overall R2 score.


API Reference

Implements cross-validation score plotting for model selection.

class yellowbrick.model_selection.cross_validation.CVScores(model, ax=None, cv=None, scoring=None, **kwargs)[source]

Bases: yellowbrick.base.ModelVisualizer

CVScores displays cross-validated scores as a bar chart, with the average of the scores plotted as a horizontal line.

model : a scikit-learn estimator

An object that implements fit and predict, can be a classifier, regressor, or clusterer so long as there is also a valid associated scoring metric. Note that the object is cloned for each validation.

ax : matplotlib.Axes object, optional

The axes object to plot the figure on.

cv : int, cross-validation generator or an iterable, optional

Determines the cross-validation splitting strategy. Possible inputs for cv are:

  • None, to use the default 3-fold cross-validation,
  • integer, to specify the number of folds.
  • An object to be used as a cross-validation generator.
  • An iterable yielding train/test splits.

See the scikit-learn cross-validation guide for more information on the possible strategies that can be used here.

scoring : string, callable or None, optional, default: None

A string or scorer callable object / function with signature scorer(estimator, X, y).

See scikit-learn cross-validation guide for more information on the possible metrics that can be used.

kwargs : dict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.


This visualizer is a wrapper for sklearn.model_selection.cross_val_score.

Refer to the scikit-learn cross-validation guide for more details.


>>> from sklearn import datasets, svm
>>> iris = datasets.load_iris()
>>> clf = svm.SVC(kernel='linear', C=1)
>>> X =
>>> y =
>>> visualizer = CVScores(model=clf, cv=5, scoring='f1_macro')
>>> visualizer.poof()

Creates the bar chart of the cross-validated scores generated from the fit method and places a dashed horizontal line that represents the average value of the scores.


Add the title, legend, and other visual final touches to the plot.

fit(X, y, **kwargs)[source]

Fits the learning curve with the wrapped model to the specified data. Draws training and test score curves and saves the scores to the estimator.

X : array-like, shape (n_samples, n_features)

Training vector, where n_samples is the number of samples and n_features is the number of features.

y : array-like, shape (n_samples) or (n_samples, n_features), optional

Target relative to X for classification or regression; None for unsupervised learning.

self : instance