signalDatastore of a large Dataset for feedforward training
i'm trying to train a feedforward net with a very large number of files andh datas (approx 13k files more than 3000 rows each). not being able to fit every data in a single matrix for the training, i tried to build a signal datastore and give it to the network, but i always receive the same error: 'Error using trainNetwork (line 191)
Invalid training data. The output size (1) of the last layer does not match the response size (2201).
Error in NN_datastore_v2 (line 76)
net=trainNetwork(sdsTrain, layers,options);'.
where is the mistake? i suppose it's in the readfunction, maybe the format? i tried several options but i can't seem to get the right combination. please help.
here's the full code:
clear all;
sds=signalDatastore(Folders,"IncludeSubfolders",true,"ReadFcn", @dataproc, 'FileExtensions','.txt');
numFiles = numel(sds.Files);
rng('default'); % Per la riproducibilità
fileIndices = randperm(numFiles);
trainRatio = 0.7;
valRatio = 0.15;
numTrain = floor(trainRatio * numFiles);
numVal = floor(valRatio * numFiles);
% Indici per ciascun set
trainIdx = fileIndices(1:numTrain);
valIdx = fileIndices(numTrain+1:numTrain+numVal);
testIdx = fileIndices(numTrain+numVal+1:end);
% Crea i sottodatastore
sdsTrain = subset(sds, trainIdx);
sdsVal = subset(sds, valIdx);
sdsTest = subset(sds, testIdx);
layers = [
featureInputLayer(429, "Normalization", "zscore")
options = trainingOptions('adam', ...
'MaxEpochs', 1000, ...
'MiniBatchSize', 64, ...
net=trainNetwork(sdsTrain, layers,options);
function data=dataproc(filename)
% opts = detectImportOptions(filename, 'Delimiter','\t');
opts=delimitedTextImportOptions("NumVariables", 442);
opts.Delimiter = "\t";
fixedVariableNames = [******];
dynamicVariableNames = "Gage" + string(1:429);
opts.VariableNames = [fixedVariableNames, dynamicVariableNames];
opts.VariableTypes = repmat("double", 1, 442);
opts=setvaropts(opts, "DecimalSeparator", ",");
tableData = readtable(filename, opts);
if size(dataNumeric,1) <l_max
% if size(dataNumeric,1) > l_max
Fz= dataNumeric(300:l_max,strcmp(fixedVariableNames, 'FzN'));
lambdas = dataNumeric(300:l_max, 14:end);
[b_butter, a_butter] = butter(7, 0.03); % Filtro passa-basso
window_size = 5; % Finestra per filtro mediano
outlierIndices = isoutlier(lambdas, 'mean');
lambdas(outlierIndices) = nan;
lambdas = fillmissing(lambdas, 'linear');
strain_filt = medfilt1(lambdas, window_size)
filtered_force = filtfilt(b_butter, a_butter, Fz);
% data.X =strain_filt;
% data.Y =filtered_force;
data = {strain_filt, filtered_force};
% end
am 19 Dez. 2024
Hi Daniele, could you please provide the data you're using to train the network?
am 23 Dez. 2024
As per my understanding, each of your files have 2201 samples. But the network outputs only one sample as the number of neurons in the last "fullyConnectedLayer" is 1. Please replace this line of code with the following code.
This would most probably solve the issue you are facing. I have not implemented the code at my end, as I do not have access to the input data.
For more information about "fullyConnectedLayer", please refer to the below link.
Hope you find this information helpful!
