Asked by Prabhakar
on 18 Jan 2011

My neural network either does not reach its error goal or takes too long and uses too much memory to train.

Answer by Akshay
on 18 Jan 2011

Edited by John Kelly
on 27 May 2014

Poor neural network performance can be described as two situations:

1. The error does not reach the goal. If this occurs then you can try the following:

a. Raise the error goal of the training function. While it seems that the lowest error goal is best, it can cause invalid training as well as hinder network generalization. For example, if you are using the TRAINLM function, the default error is 0. You may want to set the error to 1e-6 in order to make the network capable of reaching the error goal.

net=newff([0 10],[5 1],{'tansig' 'purelin'});

net.trainParam.goal=1e-6

b. You may want to use a different performance function. The default performance function is MSE (the mean squared error). It is always recommended to use MSEREG in order to improve generalization of the neural network. To set the performance function to MSEREG:

net.performFcn='msereg'; net.performParam.ratio=0.5;

However, the more a network is generalized the more difficult it is to achieve the lowest error. Therefore you may want to consider sacrificing some generalization and improving network performance by raising the performance ratio which lies in the range [0 1]. For more information on improving network generalization see the Related Solution listed at the bottom of the page. c. You may want to increase the number of epochs for training in some situations. This will take a longer time to train the network but may yield more accurate results. To set the number of epochs see the following example which is an extension to the one above:

net.trainParam.epochs=1000;

2. The next issue that arises in neural network training is the speed and memory usage of training a network to reach the goal. The following are some suggestions to improving these issues: a. You may want to preprocess your data to make the network training more efficient. Preprocessing scales the inputs so that they fall into the range of [-1 1]. If preprocessing is done before training, then postprocessing must be done to successfully analyze the results afterward. For more information on Preprocessing and Postprocessing please refer to Chapter 5 of the Neural Network

b. You may also want to try different training functions. TRAINLM is the most widely used because it is the fastest, but it requires significantly more memory than other training functions. If you want to use this function, but have less memory use, then the "mem_reduc" property needs to be raised:

net.trainParam.mem_reduc=2;

This will reduce the memory used by 2, but take longer to train.

c. Reduce the number of neurons that are being used. A general rule is to have less input parameters than output parameters. Also, it is not necessary to have as many neurons as input parameters, and the number of output neurons should be significantly less than the number of input parameters. For example, if you have your input P with 200 input parameters then it not necessarily beneficial to have 200 input neurons. You may want to try and use 20 input neurons. It is very difficult to give an exact ratio of input parameters to input neurons because each application calls for specific network architectures. This resolution is intended as a general guideline to give suggestions to improve neural network performance. For more information on any of these topics please refer to the Neural Networks User's Guide

Sign in to comment.

Answer by Greg Heath
on 18 Jan 2013

You have not given enough information.

Regression or classification?

What sizes are the input and target matrices?

Are they standardized, normalized and/or transformed?

What network creation function?

How many hidden nodes?

What creation and training defaults have you overwritten?

Sample code would help enormously.

Greg

Behzad Fotovvati
on 5 Dec 2019 at 18:15

Hi Greg,

I have the same regression problem and I would be appreciative if you could help me with it. I have used the NN fitting app, which uses a two-layer feed-forward network. I have four inputs and three outputs. They are neither standardized, normalized, nor transformed. I have only 45 observations (training: 35, validation: 5, and testing: 5), so I chose five neurons for my hidden layer. I employed the Levenburg-Marquardt training algorithm.

First of all, because my three target value sets have different ranges (one is between 5 to 20, one around 100, and one between 350 to 380), I believe the high R-squared that I get (99.99%) is not real and the "predicted vs. actual" plot has three distinct regions (figure attached). So, should I standardize/normalize/transform my response values before feeding them to the network?

Then how can I improve the performance of this network? The following is the function that is generated by the application. I also attached the input and target mat files.

Thank you in advance,

Behzad

function [Y,Xf,Af] = NN_function(X,~,~)

%MYNEURALNETWORKFUNCTION neural network simulation function.

%

% Auto-generated by MATLAB, 22-Nov-2019 17:00:32.

%

% [Y] = myNeuralNetworkFunction(X,~,~) takes these arguments:

%

% X = 1xTS cell, 1 inputs over TS timesteps

% Each X{1,ts} = Qx4 matrix, input #1 at timestep ts.

%

% and returns:

% Y = 1xTS cell of 1 outputs over TS timesteps.

% Each Y{1,ts} = Qx3 matrix, output #1 at timestep ts.

%

% where Q is number of samples (or series) and TS is the number of timesteps.

%#ok<*RPMT0>

if nargin ==0

load ('input.mat', 'input');

X= input;

end

% ===== NEURAL NETWORK CONSTANTS =====

% Input 1

x1_step1.xoffset = [170;900;100;20];

x1_step1.gain = [0.0125;0.00333333333333333;0.025;0.05];

x1_step1.ymin = -1;

% Layer 1

b1 = [3.1578500830052504966;4.4952071974983596192;-0.88364164245700427269;-1.1702983217913538461;-3.5182431864255532261];

IW1_1 = [-1.0819794624949294892 2.6692769892964860468 -7.6209503796131876641 3.312060992159889139;-4.7788086572971595345 4.1485007161535563114 0.83416670793134528594 3.8627684555966799174;-0.14205550613852135911 0.33459277147998212065 0.28679737241542646586 0.34985936950154811198;-2.0561896360862452759 1.0965988493366791712 1.5442399301331737327 -0.11614043990841006748;-0.89033979322261624922 2.5551531766568640336 1.9479683094053272807 -6.1523826340108769273];

% Layer 2

b2 = [0.34606223263192292805;-0.0033838724860873262146;0.51080999686766304091];

LW2_1 = [-0.027811241786895000982 0.028835288395292278663 -0.065274177726248938658 -0.44062348614705032501 0.0024791607847241547979;-0.26034435162446623035 0.23517629033311457376 0.13229333876255239266 -0.77292615570020328786 0.090965199728923057387;0.033716941955813012344 -0.032639694615536805899 1.4554221825175619465 0.12942922726259944999 0.0085852049359890665603];

% Output 1

y1_step1.ymin = -1;

y1_step1.gain = [0.79554494828958;0.0577867668303958;0.0582241630276565];

y1_step1.xoffset = [97.379;337.83;4.49];

% ===== SIMULATION ========

% Format Input Arguments

isCellX = iscell(X);

if ~isCellX

X = {X};

end

% Dimensions

TS = size(X,2); % timesteps

if ~isempty(X)

Q = size(X{1},1); % samples/series

else

Q = 0;

end

% Allocate Outputs

Y = cell(1,TS);

% Time loop

for ts=1:TS

% Input 1

X{1,ts} = X{1,ts}';

Xp1 = mapminmax_apply(X{1,ts},x1_step1);

% Layer 1

a1 = tansig_apply(repmat(b1,1,Q) + IW1_1*Xp1);

% Layer 2

a2 = repmat(b2,1,Q) + LW2_1*a1;

% Output 1

Y{1,ts} = mapminmax_reverse(a2,y1_step1);

Y{1,ts} = Y{1,ts}';

end

% Final Delay States

Xf = cell(1,0);

Af = cell(2,0);

% Format Output Arguments

if ~isCellX

Y = cell2mat(Y);

end

end

% ===== MODULE FUNCTIONS ========

% Map Minimum and Maximum Input Processing Function

function y = mapminmax_apply(x,settings)

y = bsxfun(@minus,x,settings.xoffset);

y = bsxfun(@times,y,settings.gain);

y = bsxfun(@plus,y,settings.ymin);

end

% Sigmoid Symmetric Transfer Function

function a = tansig_apply(n,~)

a = 2 ./ (1 + exp(-2*n)) - 1;

end

% Map Minimum and Maximum Output Reverse-Processing Function

function x = mapminmax_reverse(y,settings)

x = bsxfun(@minus,y,settings.ymin);

x = bsxfun(@rdivide,x,settings.gain);

x = bsxfun(@plus,x,settings.xoffset);

end

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.