How to change activation function for fully connected layer in convolutional neural network?

I'm in the process of implementing a wavelet neural network (WNN) using the Series Network class of the neural networking toolbox v7. While executing a simple network line-by-line, I can clearly see where the fully connected layer multiplies the inputs by the appropriate weights and adds the bias, however as best I can tell there are no additional calculations performed for the activations of the fully connected layer. It was my general understanding that standard perceptrons always have an activation/transfer function and was fully expecting to see the familiar sigmoid. However, it appears that the fully connected layer, as implemented here, assumes the identity operation as the transfer function (or, equivalently, no transfer function at all).
1) Do fully connected layers use an activation function, or are the outputs simply the weighted sums of the inputs with the addition of the bias? My initial assumption is no since I see activations greater than +1 (see example code at bottom)
2) If an activation function is used, does anyone have any suggestions where I might find and/or alter the source? I have examined the FullyConnected class and definition files and the FullyConnectedGPU(HOST)Strategy, the latter of which has the actual multiplication by weight and addition of bias.
3) If I want to use a custom activation function (in this case a wavelet), is it safe for me to simply apply said transfer function following the weighting and addition of bias? For example, if I wanted to modify a FullyConnectedLayer to have a tanh activation function, for the forward pass could I simply alter the forward method as follows? (obviously changes to the backward pass and gradient determination would also be required for the full implementation):
classdef FullyConnectedGPUStrategy < nnet.internal.cnn.layer.util.ExecutionStrategy
...
function [Z, memory] = forward(~, X, weights, bias)
Z = iForwardConvolveOrMultiply(X, weights);
Z = Z + bias;
Z = tanh(z); %addition of activation function
memory = [];
end
Example code to illustrate problem:
%Generate training data
[XTrain, YTrain] = digitTrain4DArrayData;
%Define layers
layers = [ ...
imageInputLayer([28 28 1])
fullyConnectedLayer(10)
softmaxLayer()
classificationLayer()];
%Train network using stochastic gradient descent with momentum
options = trainingOptions('sgdm');
net = trainNetwork(XTrain, YTrain, layers, options);
%View activations of fully connected layer
%Note: When testing this I see activations greater than +1 and
%less than 0, so it can't be using tanh or sigmoid
activations(net,XTrain(:,:,:,1),2)
Note: The reason I chose to use the Series Network class used for CNNs as opposed to the generic Neural Network class is because the output of the WNN will need to act as the input to a CNN which will then be trained together as one unit.

1 Kommentar

Before reading your question, let me state:
1. I am an engineer, not a mathematician. So, my following statements may not be as precise as some would like. However, I believe it should be perfectly clear what I am stating:
The STANDARD UNIVERSAL APPROXIMATOR single hidden layer regression net has
1. A nonlinear hidden layer transfer function
2. A LINEAR output layer transfer function
I'm stating this because it is obvious that some believe that, for a universal approximator, the standard output transfer function has to be nonlinear.
Of course there are additional conditions on finiteness, etc which I have omitted, but I think I have made my point.
Hope this Helps,
Greg

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Activations are added as a separate layer, and in R2017a there is only the RelULayer (see reluLayer).
Custom layers have not been introduced yet, so you'd have to be hacking or masking the toolbox files, but that's fine. You could take a copy of the RelULayer classes and modify them, or just edit your MATLAB install directly if you think that's safe.

10 Kommentare

That is exactly what I was looking for, thank you! Already at work on modifying the necessary files copied from other layers for my own implementation.
Thanks again!
I attempted to do this, by add a tanh equivalent for all Relu files in the correct locations. However, I receive the error "undefinded function or variable 'tanhLayer'. Could Matlab be halting my custom files?
All good. If you add and run the files in a administrator version of Matlab. There is no refusal. This leads me to my next question. Where does the Relu function get squashed between [-1 1]? as I would like to change this for the different activation fucntions
Dear all, I am also working on a CNN using fully connected layers as output stages. Especially if you set up a regression network, in my opinion it doesn't make any sense to use several subsequent fully connected layers in the end if they are all linear. However, this is done in many examples. @ Joss Knight: Would you mind to provide such an adapted layer class?
Thankfully, Stephan
Hi, It seems even in R2018a we don't have Sigmoid Layer. When could expect such?
I just found this. One can code a Custom Deep Learning Layer as shown in the example below. I am now trying my first attempt to code a SigmoidLayer. I hope you find this useful.
https://ww2.mathworks.cn/help/nnet/ug/define-custom-deep-learning-layer.html#mw_178590f1-d46a-4578-9180-959671d36505
Hi Ayomi Here is my self coded sigmoid Layer, following the instruction by your link. But it seems not working so good as expected. Do you have your sigmoid code ready? let's do some comparisons...
classdef sigmoidLayer < nnet.layer.Layer
methods
function layer = sigmoidLayer(name)
% Set layer name
if nargin == 2
layer.Name = name;
end
% Set layer description
layer.Description = 'sigmoidLayer';
end
function Z = predict(~,X)
% Forward input data through the layer and output the result
Z = exp(X)./(exp(X)+1);
end
function dLdX = backward(~,X, ~,dLdZ,~)
% Backward propagate the derivative of the loss function through
% the layer
dLdX = X.*(1-X) .* dLdZ;
end
end
end
Forgot to say, since I do a lot of normalizations within the layers and between the channels, the outputs of the neurons are 0~1, which makes the sigmoid function not as good as reLu. This might be the reason I did not get good results as using reLu and mathworks does not develop sigmoid layer.
@wenyi I think the backprop has to be: dLdX = Z*(1-Z)*dLdZ;
@wenyi Thank you for your work. It however has a few mistakes in it.
I know that the topic is old but I am sure this can help some people, so I post my code for the sigmoid layer based on the one of @wenyi and @Balakrishnan_Rajan. There was also a mistake with the "~".
classdef sigmoidLayer < nnet.layer.Layer
methods
function layer = sigmoidLayer(name)
% Set layer name
if nargin == 2
layer.Name = name;
end
% Set layer description
layer.Description = 'sigmoidLayer';
end
function Z = predict(layer,X)
% Forward input data through the layer and output the result
Z = exp(X)./(exp(X)+1);
end
function dLdX = backward(layer, X ,Z,dLdZ,memory)
% Backward propagate the derivative of the loss function through
% the layer
dLdX = Z.*(1-Z) .* dLdZ;
end
end
end
This is accepted by checkLayer.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Deep Learning Toolbox finden Sie in Hilfe-Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by