How should I store data from my parfor loop as to reduce runtime?

1 Ansicht (letzte 30 Tage)
Alexander Ink
Alexander Ink am 16 Jun. 2021
Beantwortet: Walter Roberson am 19 Jun. 2021
I am currently trying to find the best hyperparameters for training a KNN model for my data. To do this I am testing multiple hyperparameters by using nested for loops with a parfor loop in the center for the number of iterations. The code looks something like this:
results = table();
for ii = 1:length(nearest_neighbors)
num = nearest_neighbors(ii);
for ww = 1:length(weight_metric)
weight = weight_metric(ww);
for zz = 1:length(standardize_value)
standard = standardize_value(zz);
for dd = 1:length(distance_metric)
distance = distance_metric(dd);
parfor jj = 1:iterations
knn_MDL = fitcknn(trainData,ResponseVarName,"KFold",20,'NumNeighbors',num,'Standardize',standard,'Distance',distance,"DistanceWeight",weight);
kfold_N = kfoldLoss(knn_MDL,'mode','average');
temp = table(jj,num,standard,weight,distance,kfold,'VariableNames',{'IterationNum','NumNeighbors','Standardize_Values','Weight_Metric','Distance_Metric','KfoldLoss'});
results = [results; temp];
end
end
end
end
end
Every loop I create a temporary table and join it to the main table. This works fine if the number of hyperparameter combinations * iterations is in the low thousands range. However when I put in run it with all the possible combinations for a large number of iterations I can run this code for 8 hours on a super computer and still not finish.
Is there a better way to store the data that minimizes run time?

Antworten (1)

Walter Roberson
Walter Roberson am 19 Jun. 2021
Appending data to a table like that is expensive.
If your data is all the same datatype, store your data in arrays, and then convert to table at the end (or at least a lot less often.)
If your data is mixed datatype, then use cell arrays to store the data, and either cell2table() at the end, or append 2D arrays of cells to the table.
results = [results; temp];
is defined if temp is a 2D cell array with the same number of columns as results has variables.

Kategorien

Mehr zu Parallel for-Loops (parfor) finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by