Dispersion Plot

A word’s importance can be weighed by its dispersion in a corpus. Lexical dispersion is a measure of a word’s homogeneity across the parts of a corpus. This plot notes the occurrences of a word and how many words from the beginning of the corpus it appears.

from yellowbrick.text import DispersionPlot

After importing the visualizer, we can load the corpus

# Load the text data
corpus = load_corpus("hobbies")

# create a list of words from the corpus text
text = [word for doc in corpus.data for word in doc.split()]

# Choose words whose occurence in the text will be plotted
target_words = ['Game', 'player', 'score', 'oil', 'Man']

# Create the visualizer and draw the plot
visualizer = DispersionPlot(target_words)
visualizer.fit(text)
visualizer.poof()
../../_images/dispersion_docs.png

API Reference

Implementation of lexical dispersion for text visualization

class yellowbrick.text.dispersion.DispersionPlot(words, ax=None, color=None, ignore_case=False, **kwargs)[source]

Bases: yellowbrick.text.base.TextVisualizer

DispersionPlotVisualizer allows for visualization of the lexical dispersion of words in a corpus. Lexical dispersion is a measure of a word’s homeogeneity across the parts of a corpus. This plot notes the occurences of a word and how many words from the beginning it appears.

Parameters:
words : list

A list of target words whose dispersion across a corpus passed at fit will be visualized.

ax : matplotlib axes, default: None

The axes to plot the figure on.

color : list or tuple of colors

Specify color for bars

ignore_case : boolean, default: False

Specify whether input will be case-sensitive.

kwargs : dict

Pass any additional keyword arguments to the super class.

These parameters can be influenced later on in the visualization
process, but can and should be set as early as possible.
draw(points, **kwargs)[source]

Called from the fit method, this method creates the canvas and draws the distribution plot on it. Parameters ———- kwargs: generic keyword arguments.

finalize(**kwargs)[source]

The finalize method executes any subclass-specific axes finalization steps. The user calls poof & poof calls finalize. Parameters ———- kwargs: generic keyword arguments.

fit(text)[source]

The fit method is the primary drawing input for the dispersion visualization. It requires the corpus as a list of words.

Parameters:
text : list

A list of words in the order they appear in the corpus.