A word’s importance can be weighed by its dispersion in a corpus. Lexical dispersion is a measure of a word’s homogeneity across the parts of a corpus. This plot notes the occurrences of a word and how many words from the beginning of the corpus it appears.
from yellowbrick.text import DispersionPlot
After importing the visualizer, we can load the corpus
# Load the text data corpus = load_corpus("hobbies") # Create a list of words from the corpus text text = [doc.split() for doc in corpus.data] # Choose words whose occurence in the text will be plotted target_words = ['Game', 'player', 'score', 'oil', 'Man'] # Create the visualizer and draw the plot visualizer = DispersionPlot(target_words) visualizer.fit(text) visualizer.poof()
Implementation of lexical dispersion for text visualization
DispersionPlot(target_words, ax=None, colors=None, ignore_case=False, annotate_docs=False, labels=None, colormap=None, **kwargs)¶
DispersionPlotVisualizer allows for visualization of the lexical dispersion of words in a corpus. Lexical dispersion is a measure of a word’s homeogeneity across the parts of a corpus. This plot notes the occurences of a word and how many words from the beginning it appears.
- target_words : list
A list of target words whose dispersion across a corpus passed at fit will be visualized.
- ax : matplotlib axes, default: None
The axes to plot the figure on.
- labels : list of strings
The names of the classes in the target, used to create a legend. Labels must match names of classes in sorted order.
- colors : list or tuple of colors
Specify the colors for each individual class
- colormap : string or matplotlib cmap
Qualitative colormap for discrete target
- ignore_case : boolean, default: False
Specify whether input will be case-sensitive.
- annotate_docs : boolean, default: False
Specify whether document boundaries will be displayed. Vertical lines are positioned at the end of each document.
- kwargs : dict
Pass any additional keyword arguments to the super class.
- These parameters can be influenced later on in the visualization
- process, but can and should be set as early as possible.
draw(points, target=None, **kwargs)¶
Called from the fit method, this method creates the canvas and draws the plot on it. Parameters ———- kwargs: generic keyword arguments.
The finalize method executes any subclass-specific axes finalization steps. The user calls poof & poof calls finalize. Parameters ———- kwargs: generic keyword arguments.
fit(X, y=None, **kwargs)¶
The fit method is the primary drawing input for the dispersion visualization.
- X : list or generator
Should be provided as a list of documents or a generator that yields a list of documents that contain a list of words in the order they appear in the document.
- y : ndarray or Series of length n
An optional array or series of target or class values for instances. If this is specified, then the points will be colored according to their class.
- kwargs : dict
Pass generic arguments to the drawing method
- self : instance
Returns the instance of the transformer/visualizer