A word’s importance can be weighed by its dispersion in a corpus. Lexical dispersion is a measure of a word’s homogeneity across the parts of a corpus. This plot notes the occurrences of a word and how many words from the beginning of the corpus it appears.
from yellowbrick.text import DispersionPlot
After importing the visualizer, we can load the corpus
# Load the text data corpus = load_corpus("hobbies") # create a list of words from the corpus text text = [word for doc in corpus.data for word in doc.split()] # Choose words whose occurence in the text will be plotted target_words = ['Game', 'player', 'score', 'oil', 'Man'] # Create the visualizer and draw the plot visualizer = DispersionPlot(target_words) visualizer.fit(text) visualizer.poof()
Implementation of lexical dispersion for text visualization
DispersionPlot(words, ax=None, color=None, ignore_case=False, **kwargs)¶
DispersionPlotVisualizer allows for visualization of the lexical dispersion of words in a corpus. Lexical dispersion is a measure of a word’s homeogeneity across the parts of a corpus. This plot notes the occurences of a word and how many words from the beginning it appears.
- words : list
A list of target words whose dispersion across a corpus passed at fit will be visualized.
- ax : matplotlib axes, default: None
The axes to plot the figure on.
- color : list or tuple of colors
Specify color for bars
- ignore_case : boolean, default: False
Specify whether input will be case-sensitive.
- kwargs : dict
Pass any additional keyword arguments to the super class.
- These parameters can be influenced later on in the visualization
- process, but can and should be set as early as possible.
Called from the fit method, this method creates the canvas and draws the distribution plot on it. Parameters ———- kwargs: generic keyword arguments.
The finalize method executes any subclass-specific axes finalization steps. The user calls poof & poof calls finalize. Parameters ———- kwargs: generic keyword arguments.
The fit method is the primary drawing input for the dispersion visualization. It requires the corpus as a list of words.
- text : list
A list of words in the order they appear in the corpus.