Main Content

File Ensemble Datastore Using Data in Text Files

In predictive maintenance algorithm design, you frequently have system data in a plain text format such as comma-separated values (CSV). This example shows how to create and use a fileEnsembleDatastore object to manage an ensemble of data stored in such a format.

Ensemble Data

Extract the compressed data for the example.

unzip  % extract compressed files

The ensemble consists of ten files, fleetdata_01.txt, ..., fleetdata_10.txt, each containing data for one car in a fleet of cars. Each file contains five unlabeled columns of data, corresponding to daily readings of the following values:

  • Odometer reading at the end of the day, in miles

  • Fuel consumed that day, in gallons

  • Maximum rpm for the day

  • Maximum engine temperature for the day, in degrees Celsius

  • Engine light status at the end of the day (0 = off, 1 = on)

Each file contains data for between about 80 and about 120 days of operation. The data sets were artificially manufactured for this example and do not correspond to real fleet data.

Configure the Ensemble Datastore

Create a fileEnsembleDatastore object to manage the data.

location = pwd;
extension = '.txt';
fensemble = fileEnsembleDatastore(location,extension);

Configure the ensemble datastore to use the provided function readFleetData.m to read data from the files.

fensemble.ReadFcn = @readFleetData;

Because the columns in the data files are unlabeled, the function readFleetData attaches a predefined label to the corresponding data. Configure the ensemble data variables to match the labels defined in readFleetData.

fensemble.DataVariables = ["Odometer";"FuelConsump";"MaxRPM";"MaxTemp";"EngineLight"];

The function readFleetData also parses the file name to return an ID of the car from which the data was collected, a number from 1 through 10. This ID is the ensemble independent variable.

fensemble.IndependentVariables = "ID";

Specify all data variables and the independent variable as selected variables for reading from the ensemble datastore.

fensemble.SelectedVariables = [fensemble.IndependentVariables;fensemble.DataVariables];
fensemble = 
  fileEnsembleDatastore with properties:

                 ReadFcn: @readFleetData
        WriteToMemberFcn: []
           DataVariables: [5x1 string]
    IndependentVariables: "ID"
      ConditionVariables: [0x0 string]
       SelectedVariables: [6x1 string]
                ReadSize: 1
              NumMembers: 10
          LastMemberRead: [0x0 string]
                   Files: [10x1 string]

Read Ensemble Data

When you call read on the ensemble datastore, it uses readFleetData to read the selected variables from the first ensemble member.

data1 = read(fensemble)
data1=1×6 table
    ID        Odometer            FuelConsump            MaxRPM               MaxTemp            EngineLight   
    __    _________________    _________________    _________________    _________________    _________________

    1     {120x1 timetable}    {120x1 timetable}    {120x1 timetable}    {120x1 timetable}    {120x1 timetable}

Examine and plot the odometer data.

odo1 = data1.Odometer{1}
odo1=120×1 timetable
     Time       Var1 
    _______    ______

    0 days     180.04
    1 day      266.76
    2 days     396.01
    3 days     535.19
    4 days     574.31
    5 days     714.82
    6 days     714.82
    7 days     821.44
    8 days     1030.5
    9 days     1213.4
    10 days    1303.4
    11 days    1416.9
    12 days    1513.5
    13 days    1513.5
    14 days    1697.1
    15 days    1804.6


Compute the average gas mileage for this member of the fleet. This value is the odometer reading on the last day, divided by the total fuel consumed.

fuelConsump1 = data1.FuelConsump{1}.Var1;
totalConsump1 = sum(fuelConsump1);
totalMiles1 = odo1.Var1(end);
mpg1 = totalMiles1/totalConsump1
mpg1 = 22.3086

Batch-Process Data from All Ensemble Members

If you call read again, it reads data from the next ensemble member and advances the LastMemberRead property of fensemble to reflect the file name of that ensemble. You can repeat the processing steps to compute the average gas mileage for that member. In practice, it is more useful to automate the process of reading and processing the data. To do so, reset the ensemble datastore to a state in which no data has been read. Then loop through the ensemble and perform the read and process steps for each member, returning a table that contains each car's ID and average gas mileage. (If you have Parallel Computing Toolbox™, you can use it to speed up the processing of larger data ensembles.)

mpgData = zeros(10,2);    % preallocate array for 10 ensemble members
ct = 1;
while hasdata(fensemble)
    data = read(fensemble);
    odo = data.Odometer{1}.Var1;
    fuelConsump = data.FuelConsump{1}.Var1;
    totalConsump = sum(fuelConsump);
    mpg = odo(end)/totalConsump1;
    ID = data.ID;
    mpgData(ct,:) = [ID,mpg];
    ct = ct + 1;
mpgTable = array2table(mpgData,'VariableNames',{'ID','mpg'})
mpgTable=10×2 table
    ID     mpg  
    __    ______

     1    22.309
     2    19.327
     3    20.816
     4    27.464
     5    18.848
     6    22.517
     7    27.018
     8    27.284
     9    17.149
    10     26.37

See Also


Related Topics