Experimental data used for binary classification (room occupancy) from Temperature, Humidity, Light and CO2. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute.

Samples total





real, positive


int: {1 for occupied, 0 for not occupied}



Samples per class



Three data sets are submitted, for training and testing. Ground-truth occupancy was obtained from time stamped pictures that were taken every minute. For the journal publication, the processing R scripts can be found on GitHub.


Downloaded from the UCI Machine Learning Repository on October 13, 2016.

Candanedo, Luis M., and Véronique Feldheim. “Accurate occupancy detection of an office room from light, temperature, humidity and CO 2 measurements using statistical learning models.” Energy and Buildings 112 (2016): 28-39.

yellowbrick.datasets.loaders.load_occupancy(data_home=None, return_dataset=False)[source]

Loads the occupancy multivariate, time-series dataset that is well suited to binary classification tasks. The dataset contains 20560 instances with 5 real valued attributes and a discrete target.

The Yellowbrick datasets are hosted online and when requested, the dataset is downloaded to your local computer for use. Note that if the dataset hasn’t been downloaded before, an Internet connection is required. However, if the data is cached locally, no data will be downloaded. Yellowbrick checks the known signature of the dataset with the data downloaded to ensure the download completes successfully.

Datasets are stored alongside the code, but the location can be specified with the data_home parameter or the $YELLOWBRICK_DATA envvar.

data_homestr, optional

The path on disk where data is stored. If not passed in, it is looked up from $YELLOWBRICK_DATA or the default returned by get_data_home.

return_datasetbool, default=False

Return the raw dataset object instead of X and y numpy arrays to get access to alternative targets, extra features, content and meta.

Xarray-like with shape (n_instances, n_features) if return_dataset=False

A pandas DataFrame or numpy array describing the instance features.

yarray-like with shape (n_instances,) if return_dataset=False

A pandas Series or numpy array describing the target vector.

datasetDataset instance if return_dataset=True

The Yellowbrick Dataset object provides an interface to accessing the data in a variety of formats as well as associated metadata and content.