Dispersion Plot

A word’s importance can be weighed by its dispersion in a corpus. Lexical dispersion is a measure of a word’s homogeneity across the parts of a corpus. This plot notes the occurrences of a word and how many words from the beginning of the corpus it appears.

from yellowbrick.text import DispersionPlot
from yellowbrick.datasets import load_hobbies

# Load the text data
corpus = load_hobbies()

# Create a list of words from the corpus text
text = [doc.split() for doc in corpus.data]

# Choose words whose occurence in the text will be plotted
target_words = ['Game', 'player', 'score', 'oil', 'Man']

# Create the visualizer and draw the plot
visualizer = DispersionPlot(target_words)
visualizer.fit(text)
visualizer.show()

(Source code, png, pdf)

Dispersion Plot

API Reference

Implementation of lexical dispersion for text visualization

class yellowbrick.text.dispersion.DispersionPlot(target_words, ax=None, colors=None, ignore_case=False, annotate_docs=False, labels=None, colormap=None, **kwargs)[source]

Bases: yellowbrick.text.base.TextVisualizer

DispersionPlotVisualizer allows for visualization of the lexical dispersion of words in a corpus. Lexical dispersion is a measure of a word’s homeogeneity across the parts of a corpus. This plot notes the occurences of a word and how many words from the beginning it appears.

Parameters
target_wordslist

A list of target words whose dispersion across a corpus passed at fit will be visualized.

axmatplotlib axes, default: None

The axes to plot the figure on.

labelslist of strings

The names of the classes in the target, used to create a legend. Labels must match names of classes in sorted order.

colorslist or tuple of colors

Specify the colors for each individual class

colormapstring or matplotlib cmap

Qualitative colormap for discrete target

ignore_caseboolean, default: False

Specify whether input will be case-sensitive.

annotate_docsboolean, default: False

Specify whether document boundaries will be displayed. Vertical lines are positioned at the end of each document.

kwargsdict

Pass any additional keyword arguments to the super class.

These parameters can be influenced later on in the visualization
process, but can and should be set as early as possible.
NULL_CLASS = None
draw(self, points, target=None, **kwargs)[source]

Called from the fit method, this method creates the canvas and draws the plot on it. Parameters ———- kwargs: generic keyword arguments.

finalize(self, **kwargs)[source]

Prepares the figure for rendering by adding a title, axis labels, and managing the limits of the text labels. Adds a legend outside of the plot.

Parameters
kwargs: generic keyword arguments.

Notes

Generally this method is called from show and not directly by the user.

fit(self, X, y=None, **kwargs)[source]

The fit method is the primary drawing input for the dispersion visualization.

Parameters
Xlist or generator

Should be provided as a list of documents or a generator that yields a list of documents that contain a list of words in the order they appear in the document.

yndarray or Series of length n

An optional array or series of target or class values for instances. If this is specified, then the points will be colored according to their class.

kwargsdict

Pass generic arguments to the drawing method

Returns
selfinstance

Returns the instance of the transformer/visualizer