How to split a dataset in 3 sets using splitEachLabel using percentage such that each class appears in all 3 sets?

4 Ansichten (letzte 30 Tage)
I've an image dataset with around 100 classes and the maximum number of images for one class is 59 whereas the minimum is 5. I try to split the data into training, validation and testing by using the following statement
[imdsTrain,imdsValidation, imdsTest] = splitEachLabel(imds,0.75,0.15,'randomize');
I got the error that training and validation data must have same labels.
I checked the imds and found that for classes having less number of images like 5, it puts 4 in training and 1 sometimes either in validation set and some in test data set. So all classes that are in training are not found in validation or test data set.
I solved it by increaing the validation percent to 0.2 instead of 0.15 but it doesn't seem a good solution.
Is there a way to split the dataset such that all classes are present in all 3 datasets? Preferably I want to make it using percentages and don't want to use integer such that it puts always 1 image in validation and test dataset.

Antworten (1)

Anmol Dhiman
Anmol Dhiman am 3 Jul. 2020
Bearbeitet: Anmol Dhiman am 3 Jul. 2020
Hi Faisal,
The second arguement (0.75) in splitEachLabel is proportion representing proportion of files to split, specified as a scalar in the interval (0,1) or a positive integer scalar. You can change its value for your problem.
Regards,
Anmol Dhiman
  1 Kommentar
Muhammad Faisal
Muhammad Faisal am 7 Jul. 2020
This is already known and I appplied it. The problem I'll try to explain below with simple example.
Suppose there are 5 classes A, B, C, D, E. For each class I've some images inside the folders (unbalanced dataset). Now what happens after using the function, the training data has all 5 classes but in validation only 3 or 4 classes appears, say A, B, C, D. Similarly, in test portion few classes appears, say A, B, E.
This causes a problem for me when I use trainNetwork with ValidationData, that train and validation labels must be same. I need to have all classes in all parttions/proportions.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Get Started with Deep Learning Toolbox finden Sie in Help Center und File Exchange

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by