How to divide data into train/valid/test sets such that one sample from every class is selected?
    8 Ansichten (letzte 30 Tage)
  
       Ältere Kommentare anzeigen
    
    BR
 am 20 Nov. 2019
  
    
    
    
    
    Kommentiert: Ridwan Alam
      
 am 21 Nov. 2019
            Hello to all,
I am trying to partition a dataset into training and test sets in a way such that at least one class sample is selected in both training and the test set.
In the process, in a loop I have used cvpartition and to check whether every class sample has been selected or not, I matched the class samples from every loop to the total classes present. This is what I have done so far,
s2 = data(:,1); % target vector in data
s2_1 = unique(data(:,1)); % total number of classes
for m = 1 : 1000
    cv = cvpartition(data(:,1),'KFold',5,'Stratify',false);
    for i = 1:cv.NumTestSets
        testClasses = s2(cv.test(i));
        [~,~,idx] = unique(testClasses);
        nCount{i} = accumarray(idx(:),1);
    end
    for n = 1 : 5
        if length(nCount{1,n})==length(s2_1)
            break
        end
    end
end
There's a problem here with the break statement but I can work it out. The major problem is I don't get any proper result here and the uncertainity about the max number of loops (eg 1000) to be run here.
I hope I am able to explain properly.
Thanks in advance.
0 Kommentare
Akzeptierte Antwort
  Ridwan Alam
      
 am 20 Nov. 2019
        
      Bearbeitet: Ridwan Alam
      
 am 20 Nov. 2019
  
      Set 'Stratify' option to 'True'.
cv = cvpartition(data(:,1),'KFold',5,'Stratify',true);
6 Kommentare
Weitere Antworten (0)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!