Partition of data based on percentages (for cross-validation)

5 Ansichten (letzte 30 Tage)
Lulu Dulac
Lulu Dulac am 1 Jul. 2017
Hi all,
I have a matrix A made of x rows and y columns.
I would like to take 80% of my matrix A based on the number of rows, and do so 5 times, so as to equally partition my data set. So A1 would be the first 80% of the rows (and all columns), etc.
I had a look at this article https://uk.mathworks.com/help/nnet/ug/divide-data-for-optimal-neural-network-training.html but I am not sure any of these functions does what I want.
Please could anyone help me?
Thanks a lot

Antworten (4)

Walter Roberson
Walter Roberson am 1 Jul. 2017
If you are using a neural network, then you would configure net.divideFcn to dividerand() (the default) and set net.divideParam to the percentages you want.
Otherwise, use
nrow = size(A,1);
ntrain = floor(nrow * 80/100);
train_ind = randperm(nrow, ntrain);
train_rows = A(train_ind, :);
  10 Kommentare
Walter Roberson
Walter Roberson am 3 Mai 2019
For Holdout, M is not the percentage to hold out: it is the fraction
If you want the train index and test logical vectors you can get those directly from crossvalind:
[trainIdx, testIdx] = crossvalind('Holdout', FeatureLabSHUFFLE, M);
MA-Winlab
MA-Winlab am 3 Mai 2019
Bearbeitet: MA-Winlab am 3 Mai 2019
Yes @ Walter Roberson, thank you for the note. M should be between within the rang [0 1].
One more question:
With HoldOut, we do not have a loop like the one with k-fold! but how to do cross validation?
I mean, with HoldOut, we do not repeat the partition and traing a nd testing for k times as with k-fold!
I tried it in a loop of k=5 and kept checking cp.CorrectRate and cp.LastCorrectRate. They were the same, i.e cp.CorrectRate is not rolling.
Please correct me if I mistaken.
I am saying this because I read this
Using this method within a loop is similar to using K-fold cross-validation one time outside the loop, except that nondisjointed subsets are assigned to each evaluation.
This mote is from here

Melden Sie sich an, um zu kommentieren.


Greg Heath
Greg Heath am 2 Jul. 2017
Bearbeitet: Greg Heath am 25 Jul. 2017
1. NOTE: Contrary to most statistical regression subroutines, MATLAB Neural Network subroutines operate on COLUMN VECTORS!
2. For N O-dimensional "O"utput target vectors corresponding to N I-dimensional "I"nput vectors:
[ I N ] = size(input)
[ O N ] = size(target)
3. Correspondingly, the data in the MATLAB NN database is stored columnwise. See the results of the keyboard commands
help nndatabase
doc nndatabase
4. The MATLAB NN Toolbox DEFAULT data division ratio is 0.7/0.15/0.15 with
Ntst = floor(0.15*N)
Nval = Ntst
Ntrn = N - Nval -Ntst
4. Instead of TRYING to evenly divide the data for m-fold crossvalidation, it is far easier to just use Ntrials designs with RANDOM datadivision AND RANDOM initial weights.
5. I gave up the nitpicking index considerations of worrying about the number of times each data point was in each of the trn/val/tst subsets. If you have concerns, just increase Ntrials!
6. Somewhere in several of my NEWSGROUP and/or ANSWERS posts, I did use nitpicking XVAL index considerations. Good Luck if you want to find some. I would first search using XVAL.
Hope this helps.
Thak you for formally accepting my answer
Greg

Lulu Dulac
Lulu Dulac am 2 Jul. 2017
Thank you very much for your answer. Unfortunatly I can't use dividerand() as I don't have Matlab2017a.
I have tried with the code you proposed. I understand this part:
nrow = size(A,1);
ntrain = floor(nrow * 80/100);
But then how do I make sure I take 80% five times so that my data is evenly cover? I will also have to compare two matrixes later (A and B) and have to take the same 80% of each five times...
Thank you a lot for your answer
  4 Kommentare
Greg Heath
Greg Heath am 2 Jul. 2017
Bearbeitet: Greg Heath am 2 Jul. 2017
This should help:
HITS
SEARCH NEWSGROUP ANSWERS
CROSSVAL GREG 12 14
CROSSVAL 49 114
CROSSVALIND GREG 7 12
CROSSVALIND 46 77
CVPARTITION GREG 11 14
CVPARTITION 40 106
Greg
Walter Roberson
Walter Roberson am 3 Jul. 2017
"I don't have NN toolbox"
Then that was the operative limitation, not the fact that you are not using R2017a.
In my Answer I posted code for row-wise random division without any toolboxes. I did use a syntax of randperm that did not become available until R2011a.

Melden Sie sich an, um zu kommentieren.


ranjana roy chowdhury
ranjana roy chowdhury am 14 Jul. 2019
i have a dataset of 339 * 5825,i want to initialize 4 % of the dataset values with 0 excludind the entries that have -1 in it.please help me.
  2 Kommentare
Greg Heath
Greg Heath am 14 Jul. 2019
Start a new file and provide more details.
Greg
ranjana roy chowdhury
ranjana roy chowdhury am 15 Jul. 2019
the dataset is WS Dream dataset with 339*5825.The entries have values between 0 and 0.1,few entries are -1.I want to make 96% of this dataset 0 excluding the entries having -1 in dataset.

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by