Main Content

Sample Data Sets

Statistics and Machine Learning Toolbox™ software includes the sample data sets in the following table.

To load a data set into the MATLAB® workspace, type:

load filename

where filename is one of the files listed in the table.

Data sets contain individual data variables, description variables with references, and dataset arrays encapsulating the data set and its description, as appropriate.

FileDescription of Data Set
acetylene.matChemical reaction data with correlated predictors
arrhythmia.matCardiac arrhythmia data from the UCI machine learning repository
batterysmall.matSensor data (voltage, current, and temperature) and state of charge for a Li-ion battery; a subset of the data in [1]
carbig.matMeasurements of cars, 1970–1982
carsmall.matSubset of carbig.mat. Measurements of cars, 1970, 1976, 1982
census1994.matAdult data from the UCI machine learning repository
cereal.matBreakfast cereal ingredients
cities.matQuality of life ratings for U.S. metropolitan areas
discrim.matA version of cities.mat used for discriminant analysis
examgrades.matExam grades on a scale of 0–100
fisheriris.matFisher's 1936 iris data
flu.matGoogle Flu Trends estimated ILI (influenza-like illness) percentage for various regions of the US, and CDC weighted ILI percentage based on sentinel provider reports
gas.matGasoline prices around the state of Massachusetts in 1993
hald.matHeat of cement vs. mix of ingredients
hogg.matBacteria counts in different shipments of milk
hospital.matSimulated hospital data
humanactivity.matHuman activity recognition data of five activities: sitting, standing, walking, running, and dancing
imports-85.mat1985 Auto Imports Database from the UCI repository
ionosphere.matIonosphere dataset from the UCI machine learning repository
kmeansdata.matFour-dimensional clustered data
lawdata.matGrade point average and LSAT scores from 15 law schools
mileage.matMileage data for three car models from two factories
moore.matBiochemical oxygen demand on five predictors
morse.matRecognition of Morse code distinctions by non-coders
nlpdata.matNatural language processing data extracted from the MathWorks® documentation
ovariancancer.matGrouped observations on 4000 predictors [2][3]
parts.matDimensional run-out on 36 circular parts
polydata.matSample data for polynomial fitting
popcorn.matPopcorn yield by popper type and brand
reaction.matReaction kinetics for Hougen-Watson model
spectra.matNIR spectra and octane numbers of 60 gasoline samples
stockreturns.matSimulated stock returns

References

[1] Kollmeyer, Phillip, Carlos Vidal, Mina Naguib, and Michael Skells. "LG 18650HG2 Li-ion Battery Data and Example Deep Neural Network xEV SOC Estimator Script." Mendeley 3 (March 2020). https://doi.org/10.17632/CP3473X7XV.3.

[2] Conrads, Thomas P., Vincent A. Fusaro, Sally Ross, Don Johann, Vinodh Rajapakse, Ben A. Hitt, Seth M. Steinberg, et al. "High-Resolution Serum Proteomic Features for Ovarian Cancer Detection." Endocrine-Related Cancer 11 (2004): 163–78.

[3] Petricoin, Emanuel F., Ali M. Ardekani, Ben A. Hitt, Peter J. Levine, Vincent A. Fusaro, Seth M. Steinberg, Gordon B. Mills, et al. “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer.” The Lancet 359, no. 9306 (February 2002): 572–77.