Sample Data Sets
Statistics and Machine Learning Toolbox™ software includes the sample data sets in the following table.
To load a data set into the MATLAB® workspace, type:
load filename
where filename
is one of the files listed in the
table.
Data sets contain individual data variables, description variables with references, and dataset arrays encapsulating the data set and its description, as appropriate.
File | Description of Data Set |
---|---|
acetylene.mat | Chemical reaction data with correlated predictors |
arrhythmia.mat | Cardiac arrhythmia data from the UCI machine learning repository |
batterysmall.mat | Sensor data (voltage, current, and temperature) and state of charge for a Li-ion battery; a subset of the data in [1] |
carbig.mat | Measurements of cars, 1970–1982 |
carsmall.mat | Subset of carbig.mat . Measurements of cars, 1970,
1976, 1982 |
census1994.mat | Adult data from the UCI machine learning repository |
cereal.mat | Breakfast cereal ingredients |
cities.mat | Quality of life ratings for U.S. metropolitan areas |
discrim.mat | A version of cities.mat used for discriminant
analysis |
examgrades.mat | Exam grades on a scale of 0–100 |
fisheriris.mat | Fisher's 1936 iris data |
flu.mat | Google Flu Trends estimated ILI (influenza-like illness) percentage for various regions of the US, and CDC weighted ILI percentage based on sentinel provider reports |
gas.mat | Gasoline prices around the state of Massachusetts in 1993 |
hald.mat | Heat of cement vs. mix of ingredients |
hogg.mat | Bacteria counts in different shipments of milk |
hospital.mat | Simulated hospital data |
humanactivity.mat | Human activity recognition data of five activities: sitting, standing, walking, running, and dancing |
imports-85.mat | 1985 Auto Imports Database from the UCI repository |
ionosphere.mat | Ionosphere dataset from the UCI machine learning repository |
kmeansdata.mat | Four-dimensional clustered data |
lawdata.mat | Grade point average and LSAT scores from 15 law schools |
mileage.mat | Mileage data for three car models from two factories |
moore.mat | Biochemical oxygen demand on five predictors |
morse.mat | Recognition of Morse code distinctions by non-coders |
nlpdata.mat | Natural language processing data extracted from the MathWorks® documentation |
ovariancancer.mat | Grouped observations on 4000 predictors [2][3] |
parts.mat | Dimensional run-out on 36 circular parts |
polydata.mat | Sample data for polynomial fitting |
popcorn.mat | Popcorn yield by popper type and brand |
reaction.mat | Reaction kinetics for Hougen-Watson model |
spectra.mat | NIR spectra and octane numbers of 60 gasoline samples |
stockreturns.mat | Simulated stock returns |
References
[1] Kollmeyer, Phillip, Carlos Vidal, Mina Naguib, and Michael Skells. "LG 18650HG2 Li-ion Battery Data and Example Deep Neural Network xEV SOC Estimator Script." Mendeley 3 (March 2020). https://doi.org/10.17632/CP3473X7XV.3.
[2] Conrads, Thomas P., Vincent A. Fusaro, Sally Ross, Don Johann, Vinodh Rajapakse, Ben A. Hitt, Seth M. Steinberg, et al. "High-Resolution Serum Proteomic Features for Ovarian Cancer Detection." Endocrine-Related Cancer 11 (2004): 163–78.
[3] Petricoin, Emanuel F., Ali M. Ardekani, Ben A. Hitt, Peter J. Levine, Vincent A. Fusaro, Seth M. Steinberg, Gordon B. Mills, et al. “Use of Proteomic Patterns in Serum to Identify Ovarian Cancer.” The Lancet 359, no. 9306 (February 2002): 572–77.