The dataset was created and donated to the UCI ML Repository by John Tromp (tromp ‘@’

Samples total







str: {“win”, “loss”, “draw”}




This database contains all legal 8-ply positions in the game of connect-4 in which neither player has won yet, and in which the next move is not forced.

The symbol x represents the first player; o the second. The dataset contains the state of the game by representing each position in a 6x7 grid board. The outcome class is the game theoretical value for the first player.


Note that to use the game dataset the categorical data in the features array must be encoded numerically. There are a number of numeric encoding mechanisms such as the sklearn.preprocessing.OrdinalEncoder or the sklearn.preprocessing.OneHotEncoder that may be used as follows:

from sklearn.preprocessing import OneHotEncoder
from yellowbrick.datasets import load_game

X, y = load_game()
X = OneHotEncoder().fit_transform(X)


Downloaded from the UCI Machine Learning Repository on May 4, 2017.


yellowbrick.datasets.loaders.load_game(data_home=None, return_dataset=False)[source]

Load the Connect-4 game multivariate and spatial dataset that is well suited to multiclass classification tasks. The dataset contains 67557 instances with 42 categorical attributes and a discrete target.

Note that the game data is stored with categorical features that need to be numerically encoded before use with scikit-learn estimators. We recommend the use of the sklearn.preprocessing.OneHotEncoder for this task and to develop a Pipeline using this dataset.

The Yellowbrick datasets are hosted online and when requested, the dataset is downloaded to your local computer for use. Note that if the dataset hasn’t been downloaded before, an Internet connection is required. However, if the data is cached locally, no data will be downloaded. Yellowbrick checks the known signature of the dataset with the data downloaded to ensure the download completes successfully.

Datasets are stored alongside the code, but the location can be specified with the data_home parameter or the $YELLOWBRICK_DATA envvar.

data_homestr, optional

The path on disk where data is stored. If not passed in, it is looked up from $YELLOWBRICK_DATA or the default returned by get_data_home.

return_datasetbool, default=False

Return the raw dataset object instead of X and y numpy arrays to get access to alternative targets, extra features, content and meta.

Xarray-like with shape (n_instances, n_features) if return_dataset=False

A pandas DataFrame or numpy array describing the instance features.

yarray-like with shape (n_instances,) if return_dataset=False

A pandas Series or numpy array describing the target vector.

datasetDataset instance if return_dataset=True

The Yellowbrick Dataset object provides an interface to accessing the data in a variety of formats as well as associated metadata and content.