Does 'dividerand' really destroy time series data autocorrelations?
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
Calvin
am 6 Jul. 2014
Kommentiert: Greg Heath
am 25 Nov. 2014
I’ve seen this stated many times regarding training of narx networks and I need to understand more about the basis for the claim. At first the argument made sense so I used ‘divideind’ for narx training. But after further thought and some experimentation I’m not so sure.
When training a narxnet using the ‘dividerand’ data partitioning (net.divideFcn = 'dividerand'), does the Matlab code actually randomly parse the data into separate training, validation & testing datasets for independent narxnet calculations? Or does the Matlab code preserve the time sequence of all the inputs & targets and simply mask the irrelevant data partitions before computing the performance statistics for each partition?
If the latter, then I don’t see how ‘dividerand’ would destroy the serial correlations.
I’ve not seen anything in the NN Toolbox documentation warning users to avoid using ‘dividerand’ for narxnets. If anyone knows the code well or has done some testing to confirm, please advise! I suspect this topic is of interest to many.
Cal
0 Kommentare
Akzeptierte Antwort
Greg Heath
am 10 Okt. 2014
In general, random divisions cannot maintain the auto and crosscorelation relationships.
Just think about it.
Greg
2 Kommentare
Greg Heath
am 25 Nov. 2014
1. You are correct.
2. My ASSUMPTION that the default random data division in timeseries functions results in a DESTROYED ORDERING of points within each of the trn/val/tst subsets is INCORRECT.
3. The only effect is to RANDOMIZE the INCREASED SPACING between data points. For the default 0.7/0.15/0.15 division the average spacings are
meantrnspacing = 1/0.7 = 1.4286
meanvalspacing = 1/0.15 = 6.6667
meantstspacing = 1/0.15 = 6.6667
4. Using the dividerand documentation example, estimates of the summary statictics for spacing are given below
rng('default')
[trnind,valind,tstind] = dividerand(250,0.7,0.15,0.15);
difftrn = diff(trnind);
diffval = diff(valind);
difftst = diff(tstind);
mindifftrn = min(difftrn) % 1
mindiffval = min(diffval) % 1
mindifftst = min(difftst) % 1
meddifftrn = median(difftrn) % 1
meddiffval = median(diffval) % 6
meddifftst = median(difftst) % 4
meandifftrn = mean(difftrn) % 1.45
meandiffval = mean(diffval) % 6.29
meandifftst = mean(difftst) % 6.43
stddifftrn = std(difftrn) % 0.76
stddiffval = std(diffval) % 4.63
stddifftst = std(difftst) % 5.68
maxdifftrn = max(difftrn) % 4
maxdiffval = max(diffval) % 14
maxdifftst = max(difftst) % 19
5. Therefore, the val and tst performances may not be good predictors of performance on unseen data.
6. If the summary statistics of the time series are stationary, DIVIDEBLOCK should be a much better choice.
7. Therefore, when searching for the cause of poor performance, compare the summary statistics (including auto and crosscorrelations) of the trn/val and tst subsets.
Hope this helps.
Greg
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Modeling and Prediction with NARX and Time-Delay Networks finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!