Abnormal outputs in Neural Networks Blind tests (new tests after the net is trained).

The data divisions in my ANN model is trn 60 - val 20 - tst 20 . All of my input and target values are in between 0 - 1 (numeric values). The problem that i am facing with is >> when the network is trained, i insert new data set (five new input variables but samples number are 35) to estimate the output variable, as it was already trained with the target variable. Unfortunately the output becomes exactly same for all the 35 samples. It must not happen, there is something wrong that i might have done. Could you please light some shed on these. I am using Matlab R2012b version. Matlab codes are given as follows:
Inp_var1 = xlsread('Training data.xlsx','B2:B165241');
Inp_var2 = xlsread('Training data.xlsx','D2:D165241');
Inp_var3 = xlsread('Training data.xlsx','C2:C165241');
Inp_var4 = xlsread('Training data.xlsx','E2:E165241');
Inp_var5 = xlsread('Training data.xlsx','F2:F165241');
Tar_var1 = xlsread('Training data.xlsx','K2:K165241');
Input(1,:) = Inp_var1;
Input(2,:) = Inp_var2;
Input(3,:) = Inp_var3;
Input(4,:) = Inp_var4;
Input(5,:) = Inp_var5;
Target(1,:) = Tar_var1;
net = feedforwardnet;
net = configure(net,Input,Target);
net.layers{1}.transferFcn = 'tansig';
net.layers{1}.initFcn = 'initnw';
net.layers{2}.transferFcn = 'purelin';
net.layers{2}.initFcn = 'initnw';
net = init(net);
net.IW{1,1}
net.b{1}
net.adaptFcn = 'adaptwb';
net.inputWeights{1,1}.learnFcn = 'learnp';
net.biases{1}.learnFcn = 'learnp';
inputs = Input;
targets = Target;
hiddenLayerSize = 3; % number of hidden neurons
net = fitnet(hiddenLayerSize);
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax','mapstd'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax','mapstd'};
net.divideFcn = 'dividerand';
net.divideMode = 'sample';
net.divideParam.trainRatio = 60/100;
net.divideParam.valRatio = 20/100;
net.divideParam.testRatio = 20/100;
net.trainFcn = 'trainlm';
net.performFcn = 'mse';
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
net.efficiency.memoryReduction = 1;
[net,tr] = train(net,inputs,targets);
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs);
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs)
valPerformance = perform(net,valTargets,outputs)
testPerformance = perform(net,testTargets,outputs)
net.trainParam.epochs;
net.trainParam.time;
net.trainParam.goal;
net.trainParam.min_grad;
net.trainParam.mu_max;
net.trainParam.max_fail;
net.trainParam.show;
end

 Akzeptierte Antwort

Most of your code is useless. It is equivalent to
Input(1,:) = xlsread('Training data.xlsx','B2:B165241');
Input(2,:) = xlsread('Training data.xlsx','D2:D165241');
Input(3,:) = xlsread('Training data.xlsx','C2:C165241');
Input(4,:) = xlsread('Training data.xlsx','E2:E165241');
Input(5,:) = xlsread('Training data.xlsx','F2:F165241');
Target(1,:) = xlsread('Training data.xlsx','K2:K165241');
hiddenLayerSize = 3; % number of hidden neurons
net = fitnet(hiddenLayerSize);
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax','mapstd'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax','mapstd'};
net.divideParam.trainRatio = 60/100;
net.divideParam.valRatio = 20/100;
net.divideParam.testRatio = 20/100;
[net,tr] = train(net,inputs,targets);
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs);
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs)
valPerformance = perform(net,valTargets,outputs)
testPerformance = perform(net,testTargets,outputs)
Make sure the input summary statistics (mean/covariance-matrix) for the original and new data are close enough to be assumed to be drawn from the same probability distribution.
Thank you for formally accepting my answer
Greg

6 Kommentare

Thanks for answering the question, Greg. I know that i have written many extra lines in the codes, but that may not be the problem anyways.
Let me ask you few more points:
1) do i have to rescale the " new input data / blind test data "? If so, how can i do that,could you please guide me to do that.
2) Please have a look at this particular issue: my training dataset contains 165240 * 5 matrix, and the blind test dataset contains 35 * 5 matrix. In both of the cases the outputs is a single variable >> 165240 * 1, and 35 * 1 respectively. The blind test dataset has very few data compared to training one. Are the abnormal values in output variable is caused by this issue?
3) Does the Original dataset, and New dataset always have to be drawn from the same probability distribution?
3. If they are plotted, the boundaries of the original should contain the new.
1. Not if 3 is true. If the original data was scaled inside train, the net should automatically scale new data. Please explain the adjective "blind". The adjective "test" already indicates that the training algorithm has never taken it into account, regardless of whether it is original or new.
2. Reverse your notation training set is 5 x 165240, etc.
mmenvo
mmenvo am 25 Feb. 2014
Bearbeitet: mmenvo am 25 Feb. 2014
1. I wanted to mean by 'blind test' is to to test output variable of the trained network from totally new input data (which was not used for training). Moreover, the original dataset (used in training) was scaled (-1 to 1), but the new input dataset was not scaled by any function. I do not know how to do that, in case the net automatically scales new data then the output should not be abnormal, which i am getting.
2. The original input contains 5 variables and 165240 samples, and the original target parameter contains 1 variable and 165240 samples. And the completely new input data (which i am testing to retrieve the output values by ANN)contains 5 variables and 35 samples, and the output would be 1 variable and 35 samples. The problem is >> i am getting exactly same output values for all the 35 samples. Why this is happening. I hope you may help me to solve this problem.
1. Only training and validation data are used for design
2. Neither original test data set nor new test data is used for design.
3. If the original data was automatically normalized, the new test data will also be automatically normalized in ynew=sim(net,xnew) or ynew = net(xnew).
4. Compare
minmaxx = minmax(x)
meanx2 = mean(x,2)
stdx2 = std(x,0,2)
with the result when x is replaced by xnew.
Many thanks, Greg. I have understood your clarifications on point 1,2,3 and 4.
I would be happy if you kindly suggests the following:
5. What is your recommendation on ANN data division for a very large dataset? [i.e.Original training data samples are over 165240. I also have different dataset containing samples larger than 600000]. In my case i have 5 input neurons, and 1 output neuron.
which one would you recommend (tr-val-tst)>> i) 70-15-15, ii) 60-20-20, iii) 80-10-10, iv) 70-20-10, or v) 80-20-00
With N this large you do not need val and test
net.divide.Fcn = 'dividetrain'; %100/0/0
and try to minimize H, the number of hidden nodes.
Hope this helps.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Deep Learning Toolbox finden Sie in Hilfe-Center und File Exchange

Gefragt:

am 23 Feb. 2014

Bearbeitet:

am 26 Feb. 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by