MissingValues Dispersion

The MissingValues Dispersion visualizer creates a chart that maps the position of missing values by the order of the index.

Without Targets Supplied

import numpy as np

from sklearn.datasets import make_classification
from yellowbrick.contrib.missing import MissingValuesDispersion

X, y = make_classification(
    n_samples=400, n_features=10, n_informative=2, n_redundant=3,
    n_classes=2, n_clusters_per_class=2, random_state=854
)

# assign some NaN values
X[X > 1.5] = np.nan
features = ["Feature {}".format(str(n)) for n in range(10)]

visualizer = MissingValuesDispersion(features=features)

visualizer.fit(X)
visualizer.show()

(Source code, png, pdf)

MissingValues Dispersion visualization on a dataset with no targets supplied

With Targets (y) Supplied

import numpy as np

from sklearn.datasets import make_classification
from yellowbrick.contrib.missing import MissingValuesDispersion

X, y = make_classification(
    n_samples=400, n_features=10, n_informative=2, n_redundant=3,
    n_classes=2, n_clusters_per_class=2, random_state=854
)

# assign some NaN values
X[X > 1.5] = np.nan
features = ["Feature {}".format(str(n)) for n in range(10)]

# Instantiate the visualizer
visualizer = MissingValuesDispersion(features=features)

visualizer.fit(X, y=y) # supply the targets via y
visualizer.show()

(Source code, png, pdf)

MissingValues Dispersion visualization on a dataset with no targets supplied

API Reference

Dispersion visualizer for locations of missing values by column against index position.

class yellowbrick.contrib.missing.dispersion.MissingValuesDispersion(alpha=0.5, marker='|', classes=None, **kwargs)[source]

Bases: MissingDataVisualizer

The Missing Values Dispersion visualizer shows the locations of missing (nan) values in the feature dataset by the order of the index.

When y targets are supplied to fit, the output dispersion plot is color coded according to the target y that the element refers to.

Parameters
alphafloat, default: 0.5

A value for bending elments with the background.

markermatplotlib marker, default: |

The marker used for each element coordinate in the plot

classeslist, default: None

A list of class names for the legend. If classes is None and a y value is passed to fit then the classes are selected from the target vector.

kwargsdict

Keyword arguments that are passed to the base class and may influence the visualization as defined in other Visualizers.

Examples

>>> from yellowbrick.contrib.missing import MissingValuesDispersion
>>> visualizer = MissingValuesDispersion()
>>> visualizer.fit(X, y=y)
>>> visualizer.show()
Attributes
features_np.array

The feature labels ranked according to their importance

classes_np.array

The class labels for each of the target values

draw(X, y, **kwargs)[source]

Called from the fit method, this method creates a scatter plot that draws each instance as a class or target colored point, whose location is determined by the feature data set.

If y is not None, then it draws a scatter plot where each class is in a different color.

draw_multi_dispersion_chart(nan_locs)[source]

Draws a multi dimensional dispersion chart, each color corresponds to a different target variable.

finalize(**kwargs)[source]

Sets the title and x-axis label and adds a legend. Also ensures that the y tick labels are set to the feature names.

Parameters
kwargs: generic keyword arguments.

Notes

Generally this method is called from show and not directly by the user.

get_nan_locs(**kwargs)[source]

Gets the locations of nans in feature data and returns the coordinates in the matrix