Mushroom

From Audobon Society Field Guide; mushrooms described in terms of physical characteristics; classification: poisonous or edible.

Samples total

8124

Dimensionality

4 (reduced from 22)

Features

categorical

Targets

str: {“edible”, “poisonous”}

Task(s)

classification

Description

This data set includes descriptions of hypothetical samples corresponding to 23 species of gilled mushrooms in the Agaricus and Lepiota Family (pp. 500-525). Each species is identified as definitely edible, definitely poisonous, or of unknown edibility and not recommended. This latter class was combined with the poisonous one. The Guide clearly states that there is no simple rule for determining the edibility of a mushroom; no rule like “leaflets three, let it be” for Poisonous Oak and Ivy.

Citation

Downloaded from the UCI Machine Learning Repository on February 28, 2017.

Schlimmer, Jeffrey Curtis. “Concept acquisition through representational adjustment.” (1987).

Langley, Pat. “Trading off simplicity and coverage in incremental concept learning.” Machine Learning Proceedings 1988 (2014): 73.

Duch, Włodzisław, Rafał Adamczak, and Krzysztof Grabczewski. “Extraction of logical rules from training data using backpropagation networks.” The 1st Online Workshop on Soft Computing. 1996.

Duch, Wlodzislaw, Rafal Adamczak, and Krzysztof Grabczewski. “Extraction of crisp logical rules using constrained backpropagation networks.” (1997).

Loader

yellowbrick.datasets.loaders.load_mushroom(data_home=None, return_dataset=False)[source]

Loads the mushroom multivariate dataset that is well suited to binary classification tasks. The dataset contains 8123 instances with 3 categorical attributes and a discrete target.

The Yellowbrick datasets are hosted online and when requested, the dataset is downloaded to your local computer for use. Note that if the dataset hasn’t been downloaded before, an Internet connection is required. However, if the data is cached locally, no data will be downloaded. Yellowbrick checks the known signature of the dataset with the data downloaded to ensure the download completes successfully.

Datasets are stored alongside the code, but the location can be specified with the data_home parameter or the $YELLOWBRICK_DATA envvar.

Parameters
data_homestr, optional

The path on disk where data is stored. If not passed in, it is looked up from $YELLOWBRICK_DATA or the default returned by get_data_home.

return_datasetbool, default=False

Return the raw dataset object instead of X and y numpy arrays to get access to alternative targets, extra features, content and meta.

Returns
Xarray-like with shape (n_instances, n_features) if return_dataset=False

A pandas DataFrame or numpy array describing the instance features.

yarray-like with shape (n_instances,) if return_dataset=False

A pandas Series or numpy array describing the target vector.

datasetDataset instance if return_dataset=True

The Yellowbrick Dataset object provides an interface to accessing the data in a variety of formats as well as associated metadata and content.