fileEnsembleDatastore
Manage ensemble data in custom file format
Description
A fileEnsembleDatastore
object is a datastore specialized
for use in developing algorithms for condition monitoring and predictive maintenance
using measured data.
An ensemble is a collection of member data stored in a collection of files. The
fileEnsembleDatastore
object specifies the data variables,
independent variables, and condition variables in the ensemble. You provide functions
that tell the fileEnsembleDatastore
object how to read each type of
variable from the collection of files. Therefore, you can use
fileEnsembleDatastore
to manage ensemble data stored in any file
format or configuration of variables.
The data for a fileEnsembleDatastore
object can be stored at any
location supported by MATLAB® datastores, including remote locations, such as cloud storage using
Amazon S3™ (Simple Storage Service), Windows Azure® Blob Storage, and Hadoop® Distributed File System (HDFS™).
For a detailed example illustrating the use of a file ensemble datastore, see File Ensemble Datastore with Measured Data. For general information about data ensembles in Predictive Maintenance Toolbox™, see Data Ensembles for Condition Monitoring and Predictive Maintenance.
Creation
Syntax
Description
creates a fensemble
= fileEnsembleDatastore(location
,extension
)fileEnsembleDatastore
object that points to data at
the file path specified by location
and having the
specified file extension. Set properties of the object to specify the functions
for reading from and writing to the ensemble datastore.
specifies additional properties of the object using one or more name-value pair
arguments. For example, using
fensemble
= fileEnsembleDatastore(location
,extension
,Name,Value)'ConditionVariables',["FaultCond";"ID"]
specifies the
condition variables when you create the object.
Input Arguments
location
— Files or folders
string | character vector | string array | cell array of character vectors
Files or folders from which to read ensemble data, specified as a
string, character vector, string array, or cell array of character
vectors. If the files are not in the current folder, then
location
must contain full or relative paths.
If you specify a folder, then fileEnsembleDatastore
uses all files in that folder with the extension specified by
extension
. Alternatively, specify an explicit
list of files to include. You can also use the wildcard character (*)
when specifying location
. This character indicates
that all matching files or all files in the matching folders are
included in the datastore.
The file path can be any location supported by MATLAB datastores, including an IRI path pointing to a remote location, such as cloud storage using Amazon S3 (Simple Storage Service), Windows Azure Blob Storage, and Hadoop Distributed File System (HDFS). For more information about working with remote data in MATLAB, see Work with Remote Data.
Example: pwd + "\simResults"
Example: {'C:\dir\data\file1.xls','C:\dir\data\file2.xlsx'}
Example: "../dir/data/*.mat"
extension
— File extension
string | character vector | string vector
File extension for files in the datastore, specified as a string or a
character vector, such as ".mat"
or
'.csv'
.
If the datastore contains files having more than one extension,
specify them as a string vector, such as
[".xls",".xlsx"]
. The functions that you supply
for the ReadFcn
and
WriteToMemberFcn
properties must be able to
interact with all specified file types.
Properties
ReadFcn
— Function for reading all selected variables
[]
(default) | function handle
Function for reading all selected variables from the ensemble, specified as a handle to a function you provide. You write a function that instructs the software how to read variables from a data file containing a member of your ensemble. The function has:
Two inputs, a file name (string), and the names of signals (string vector) to load from the file
One output, a table row with table variables for each independent variable
When you specify ReadFcn
, the software uses this
function to read all selected variables from the ensemble, regardless of
whether they are named in DataVariables
,
IndependentVariables
, or
ConditionVariables
.
For example, suppose that you write the following function,
readVars
, for reading variables from your files. This
function creates a table containing the variables in a data file that match
those in the input string vector, variables
.
function data = readVars(filename,variables) data = table(); mfile = matfile(filename); % Allows partial loading for ct=1:numel(variables) val = mfile.(variables{ct}); if numel(val) > 1 val = {val}; end data.(variables{ct}) = val; end end
Save the function in a MATLAB file in the current folder or on the path. Then, if you create
a fileEnsembleDatastore
called
fensemble
, set ReadFcn
as
follows.
fensemble.ReadFcn = @readVars;
When you call read(fensemble)
, the software uses
readVars
to read all the variables in the
SelectedVariables
property of the ensemble
datastore. You must set this property to read data from a
fileEnsembleDatastore
member. Otherwise,
read
generates an error.
WriteToMemberFcn
— Function for adding data
[]
(default) | function handle
Function for writing data to the last-read member of the ensemble, specified as a handle to a function you provide. You write a function that instructs the software how to write variables to a data file containing a member of your ensemble. The function has:
Two inputs, a file name (string), and a data structure whose field names are the data variables to write, and whose values are the corresponding values
No outputs
For example, suppose that you write the following function,
writeNewData
, for writing data to your files. This
function writes an input data structure named data
to the
specified data file.
function writeNewData(filename,data) save(filename, '-append', '-struct', 'data'); end
Store writeNewData
in a MATLAB file in the current folder or on the path. Then, if you create
a fileEnsembleDatastore
called
fensemble
, set WriteToMemberFcn
as
follows:
fensemble.WriteToMemberFcn = @writeNewData;
When you call the writeToLastMemberRead
command on
fensemble
, the software uses
writeNewData
to add the new data to the data file of
the last-read ensemble member. You must set this property to add data to a
fileEnsembleDatastore
member. Otherwise,
writeToLastMemberRead
generates an error.
DataVariables
— Data variables in the ensemble
[]
(default) | string array
Data variables in the ensemble, specified as a string array. Data variables are the main content of the members of an ensemble. Data variables can include measured data or derived data for analysis and development of predictive maintenance algorithms. For example, your data variables might include measured or simulated vibration signals and derived values such as mean vibration value or peak vibration frequency. In practice, your data variables, independent variables, and condition variables are all distinct sets of variables.
You can also specify DataVariables
using a cell array of
character vectors, such as {'Vibration';'Tacho'}
, but the variable
names are always stored as a string array, ["Vibration";"Tacho"]
. If
you specify a matrix of variable names, the matrix is flattened to a column
vector.
IndependentVariables
— Independent variables in the ensemble
[]
(default) | string array
Independent variables in the ensemble, specified as a string array. You typically use independent variables to order the members of an ensemble. Examples are timestamps, number of operating hours, or miles driven. Set this property to the names of such variables in your ensemble. In practice, your data variables, independent variables, and condition variables are all distinct sets of variables.
You can also specify IndependentVariables
using a cell array of
character vectors, such as {'Time';'Age'}
, but the variable names are
always stored as a string array, ["Time";"Age"]
. If you specify a
matrix of variable names, the matrix is flattened to a column vector.
ConditionVariables
— Condition variables in the ensemble
[]
(default) | string array
Condition variables in the ensemble, specified as a string array. Use condition variables to label the members in a ensemble according to the fault condition or other operating condition under which the ensemble member was collected. In practice, your data variables, independent variables, and condition variables are all distinct sets of variables.
You can also specify ConditionVariables
using a cell array of
character vectors, such as {'GearFault';'Temperature'}
, but the
variable names are always stored as a string array,
["GearFault";"Temperature"]
. If you specify a matrix of variable
names, the matrix is flattened to a column vector.
SelectedVariables
— Variables to read
[]
(default) | string array
Variables to read from the ensemble, specified as a string array. Use this property
to specify which variables are extracted to the MATLAB workspace when you use the read
command to read data from the current member ensemble. read
returns
a table row containing a table variable for each name specified in
SelectedVariables
. For example, suppose that you have an
ensemble, fensemble
, that contains six variables, and you want to
read only two of them, Vibration
and FaultState
.
Set the SelectedVariables
property and call
read
:
fensemble.SelectedVariables = ["Vibration";"FaultState"]; data = read(fensemble)
SelectedVariables
can be any combination of the variables in the
DataVariables
, ConditionVariables
, and
IndependentVariables
properties. If
SelectedVariables
is empty, read
generates
an error.
You can specify SelectedVariables
using a cell array of character
vectors, such as {'Vibration';'Tacho'}
, but the variable names are
always stored as a string array, ["Vibration";"Tacho"]
. If you
specify a matrix of variable names, the matrix is flattened to a column vector.
ReadSize
— Number of members to read
1 (default) | positive integer
Number of members to read from the ensemble datastore at once, specified as a positive integer that is smaller than the total number of members in the ensemble. By default, the read
command returns a one-row table containing data from one ensemble member. To read data from multiple members in a single read
operation, set this property to an integer value greater than one. For example, if ReadSize
= 3, then read
returns a three-row table where each row contains data from a different ensemble member. If fewer than ReadSize
members are unread, then read
returns a table with as many rows as there are remaining members.
The ensemble datastore property LastMemberRead
contains the names of all files read during the most recent read
operation. Thus, for instance, if ReadSize
= 3, then a read
operation sets LastMemberRead
to a string vector containing three file names.
When you use writeToLastMemberRead
, specify the data to write as a table with a number of rows equal to ReadSize
. The writeToLastMemberRead
command updates the members specified by LastMemberRead
, writing one table row to each specified file.
Changing the ReadSize
property also resets the ensemble to its unread state. For instance, suppose that you read some ensemble members one at a time (ReadSize
= 1), and then change ReadSize
to 3. The next read
operation returns data from the first three ensemble members.
NumMembers
— Number of members in ensemble
positive integer
This property is read-only.
Number of members in the ensemble, specified as a positive integer.
LastMemberRead
— File name of last ensemble member read
""
(default) | string | string array
This property is read-only.
File name of last ensemble member read into the MATLAB workspace, specified as a string. When you use the
read
command to read data from an ensemble datastore, the
software determines which ensemble member to read next, and reads data from the
corresponding file. The LastMemberRead
property contains the path
to the most recently read file. When the ensemble datastore has not yet been read, or
has been reset, LastMemberRead
is an empty string.
When you call writeToLastMemberRead
to add data back to the
ensemble datastore, that function writes to the file specified in
LastMemberRead
.
By default, read
reads data from one ensemble member at a time
(the ReadSize
property of the ensemble datastore is 1). When
ReadSize
> 1, LastMemberRead
is a string
array containing the paths to all files read in the most recent
read
operation.
Files
— List of files in ensemble datastore
string vector
This property is read-only.
List of files in the ensemble datastore, specified as a column string vector of length
NumMembers
. Each entry contains the full path to a file in the
datastore. The files are in the order in which the read
command
reads ensemble members.
Example: ["C:\Data\Data_01.csv"; "C:\Data\Data_02.csv";
"C:\Data\Data_03.csv"]
Object Functions
The read
, writeToLastMemberRead
, and
subset
functions are specialized for Predictive Maintenance Toolbox ensemble data. Other functions, such as reset
and
hasdata
, are identical to those used with
datastore
objects in MATLAB. To transfer all the member data into a table or cell array with a single
command, use readall
. To extract specific ensemble
members into a smaller or more specialized ensemble datastore, use subset
. To partition an ensemble datastore, use the
partition(ds,n,index)
syntax of the
partition
function.
read | Read member data from an ensemble datastore |
writeToLastMemberRead | Write data to member of an ensemble datastore |
subset | Create new ensemble datastore from subset of existing ensemble datastore |
reset | Reset datastore to initial state |
hasdata | Determine if data is available to read |
progress | Determine how much data has been read |
readall | Read all data in datastore |
numpartitions | Number of datastore partitions |
partition | Partition a datastore |
tall | Create tall array |
transform | Transform datastore |
isPartitionable | Determine whether datastore is partitionable |
isShuffleable | Determine whether datastore is shuffleable |
Examples
Create and Configure File Ensemble Datastore
Create a file ensemble datastore for data stored in MATLAB® files, and configure it with functions that tell the software how to read from and write to the datastore.
For this example, you have two data files containing healthy operating data from a bearing system, baseline_01.mat
and baseline_02.mat
. You also have three data files containing faulty data from the same system, FaultData_01.mat
, FaultData_02.mat
, and FaultData_03.mat
.
unzip fileEnsData.zip % extract compressed files location = pwd; extension = '.mat'; fensemble = fileEnsembleDatastore(location,extension);
Before you can interact with data in the ensemble, you must create functions that tell the software how to process the data files to read variables into the MATLAB workspace and to write data back to the files. Save these functions to a location on the file path. For this example, use the following supplied functions:
readBearingData
— Extract requested variables from a structure,bearing
, and other variables stored in the file. This function also parses the filename for the fault status of the data. The function returns a table row containing one table variable for each requested variable.writeBearingData
— Take a structure and write its variables to a data file as individual stored variables.
fensemble.ReadFcn = @readBearingData; fensemble.WriteToMemberFcn = @writeBearingData;
Finally, set properties of the ensemble to identify data variables, condition variables, and selected variables for reading. For this example, the variables in the data file are gs
, sr
, load
, and rate
. Suppose that you only need to read the fault label, gs
, and sr
. Set these variables as the selected variables.
fensemble.DataVariables = ["gs";"sr";"load";"rate"]; fensemble.ConditionVariables = ["label"]; fensemble.SelectedVariables = ["label";"gs";"sr"];
Examine the ensemble. The functions and the variable names are assigned to the appropriate properties.
fensemble
fensemble = fileEnsembleDatastore with properties: ReadFcn: @readBearingData WriteToMemberFcn: @writeBearingData DataVariables: [4x1 string] IndependentVariables: [0x0 string] ConditionVariables: "label" SelectedVariables: [3x1 string] ReadSize: 1 NumMembers: 5 LastMemberRead: [0x0 string] Files: [5x1 string]
These functions that you assigned tell the read
and writeToLastMemberRead
commands how to interact with the data files that make up the ensemble. For example, when you call the read
command, it uses readBearingData
to read all the variables in fensemble.SelectedVariables
. For a more detailed example, see File Ensemble Datastore with Measured Data.
Read from and Write to a File Ensemble Datastore
Create a file ensemble datastore for data stored in MATLAB® files, and configure it with functions that tell the software how to read from and write to the datastore. (For more details about configuring file ensemble datastores, see File Ensemble Datastore with Measured Data.)
% Create ensemble datastore that points to datafiles in current folder unzip fileEnsData.zip % extract compressed files location = pwd; extension = '.mat'; fensemble = fileEnsembleDatastore(location,extension); % Specify data and condition variables fensemble.DataVariables = ["gs";"sr";"load";"rate"]; fensemble.ConditionVariables = "label"; % Configure with functions for reading and writing variable data fensemble.ReadFcn = @readBearingData; fensemble.WriteToMemberFcn = @writeBearingData;
The functions tell the read
and writeToLastMemberRead
commands how to interact with the data files that make up the ensemble. Thus, when you call the read
command, it uses readBearingData
to read all the variables in fensemble.SelectedVariables
. For this example, readBearingData
extracts requested variables from a structure, bearing
, and other variables stored in the file. It also parses the filename for the fault status of the data.
Specify variables to read, and read them from the first member of the ensemble.
fensemble.SelectedVariables = ["gs";"load";"label"]; data = read(fensemble)
data=1×3 table
label gs load
________ _______________ ____
"Faulty" {5000x1 double} 0
You can now process the data from the member as needed. For this example, compute the average value of the signal stored in the variable gs
. Extract the data from the table returned by read
.
gsdata = data.gs{1}; gsmean = mean(gsdata);
You can write the mean value gsmean
back to the data file as a new variable. To do so, first expand the list of data variables in the ensemble to include a variable for the new value. Call the new variable gsMean
.
fensemble.DataVariables = [fensemble.DataVariables;"gsMean"]
fensemble = fileEnsembleDatastore with properties: ReadFcn: @readBearingData WriteToMemberFcn: @writeBearingData DataVariables: [5x1 string] IndependentVariables: [0x0 string] ConditionVariables: "label" SelectedVariables: [3x1 string] ReadSize: 1 NumMembers: 5 LastMemberRead: "/tmp/Bdoc24b_2725827_2687332/tp232afcfc/predmaint-ex34165887/FaultData_01.mat" Files: [5x1 string]
Next, write the derived mean value to the file corresponding to the last-read ensemble member. (See Data Ensembles for Condition Monitoring and Predictive Maintenance.) When you call writeToLastMemberRead
, it converts the data to a structure and calls fensemble.WriteToMemberFcn
to write the data to the file.
writeToLastMemberRead(fensemble,'gsMean',gsmean);
Calling read
again advances the last-read-member indicator to the next file in the ensemble and reads the data from that file.
data = read(fensemble)
data=1×3 table
label gs load
________ _______________ ____
"Faulty" {5000x1 double} 50
You can confirm that this data is from a different member by examining the load
variable in the table. Here, its value is 50, while in the previously read member, it was 0.
You can repeat the processing steps to compute and append the mean for this ensemble member. In practice, it is more useful to automate the process of reading, processing, and writing data. To do so, reset the ensemble to a state in which no data has been read. Then loop through the ensemble and perform the read, process, and write steps for each member.
reset(fensemble) while hasdata(fensemble) data = read(fensemble); gsdata = data.gs{1}; gsmean = mean(gsdata); writeToLastMemberRead(fensemble,'gsMean',gsmean); end
The hasdata
command returns false
when every member of the ensemble has been read. Now, each data file in the ensemble includes the gsMean
variable derived from the data gs
in that file. You can use techniques like this loop to extract and process data from your ensemble files as you develop a predictive-maintenance algorithm. For an example illustrating in more detail the use of a file ensemble datastore in the algorithm-development process, see Rolling Element Bearing Fault Diagnosis. The example also shows how to use Parallel Computing Toolbox™ to speed up the processing of large data ensembles.
To confirm that the derived variable is present in the file ensemble datastore, read it from the first and second ensemble members. To do so, reset the ensemble again, and add the new variable to the selected variables. In practice, after you have computed derived values, it can be useful to read only those values without rereading the unprocessed data, which can take significant space in memory. For this example, read selected variables that include the new variable, gsMean
, but do not include the unprocessed data, gs
.
reset(fensemble) fensemble.SelectedVariables = ["label";"load";"gsMean"]; data1 = read(fensemble)
data1=1×3 table
label load gsMean
________ ____ ________
"Faulty" 0 -0.22648
data2 = read(fensemble)
data2=1×3 table
label load gsMean
________ ____ ________
"Faulty" 50 -0.22937
Version History
Introduced in R2018aR2018b: DataVariablesFcn
, IndependentVariablesFcn
, and ConditionVariablesFcn
properties will be removed
The DataVariablesFcn
,
IndependentVariablesFcn
, and
ConditionVariablesFcn
properties will be removed in a future
release. Use the ReadFcn
property instead.
The ReadFcn
property, introduced in R2018b, lets you specify
one function to read all variable types from your ensemble datastore. Formerly, you
had to designate functions separately for data variables, independent variables, and
condition variables. An advantage of using ReadFcn
is that the
read
operation needs to access each member file only once
to read all the variables. With separate functions for each variable type,
read
opens the file up to three times to read all variable
types. Thus, designating a single ReadFcn
is a more efficient
way to access the datastore.
To update your code to use the new property:
Rewrite your
fileEnsembleDatastore
read functions into one new function that reads variables of all types. (See Create and Configure File Ensemble Datastore for an example of such a function.)Set
DataVariablesFcn
,IndependentVariablesFcn
, andConditionVariablesFcn
to[]
to clear them.Set
ReadFcn
to the new function.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)