Difficulties in training ANNs with multiple outputs: always constant outputs

9 Ansichten (letzte 30 Tage)
Does anyone have experience with defining a neural network that has multiple outputs? I want to input a vector and output a vector as well as a matrix. Accordingly, I need a DAG network.
I realize that I need a custom training loop for this, compare with: https://de.mathworks.com/help/deeplearning/ug/train-network-with-multiple-outputs.html
The good news is that the training (code) can basically be performed:
I have chosen the following network architecture:
numNeurons = 10;
% Input
layers1 = [
featureInputLayer(size(XData,2),'Name','Param_Input',"Normalization","rescale-symmetric");
fullyConnectedLayer(numNeurons)
batchNormalizationLayer
tanhLayer
fullyConnectedLayer(numNeurons)
batchNormalizationLayer
tanhLayer
fullyConnectedLayer(numNeurons)
batchNormalizationLayer
tanhLayer('Name','tanh_middle')
];
lgraph = layerGraph(layers1);
% Output 1
filterSize = dimOutput{1};
numFilters = 20;
strideSize = [1,1];
projectionSize = [1,1,size(XData,2)];
layers2 = [
fullyConnectedLayer(numNeurons,'Name','fcEF')
batchNormalizationLayer
tanhLayer
fullyConnectedLayer(numNeurons)
batchNormalizationLayer
tanhLayer
projectAndReshapeLayerNew(projectionSize)
transposedConv2dLayer(filterSize,numFilters,'Stride',strideSize,Cropping="same")
batchNormalizationLayer
tanhLayer
transposedConv2dLayer(filterSize,1,'Stride',strideSize,'Name','Output1')
];
% Output 2
layers3 = [
fullyConnectedLayer(numNeurons,'Name','fcFreq')
batchNormalizationLayer
tanhLayer
fullyConnectedLayer(numNeurons)
batchNormalizationLayer
tanhLayer
fullyConnectedLayer(dimOutput{2},'Name','Output2')
];
lgraph = addLayers(lgraph,layers2);
lgraph = addLayers(lgraph,layers3);
lgraph = connectLayers(lgraph,"tanh_middle","fcEF");
lgraph = connectLayers(lgraph,"tanh_middle","fcFreq");
end
% [...] Training
% "Assemble Multiple-Output Network for Prediction"
lgraphNew = layerGraph(trainedNet);
layerReg1 = regressionLayer(Name="regOutput1");
layerReg2 = regressionLayer(Name="regOutput2");
lgraphNew = addLayers(lgraphNew,layerReg1);
lgraphNew = addLayers(lgraphNew,layerReg2);
lgraphNew = connectLayers(lgraphNew,"Output1","regOutput1");
lgraphNew = connectLayers(lgraphNew,"Output2","regOutput2");
figure
plot(lgraphNew)
However, the problem is that all outputs (coefficients of the vector and the matrix) are the same. Apparently the network learns some average values and not the concrete training data as desired:
Output 1 (all matrices are the same):
Output 2 (all "vectors"/lines are the same):
Is the network architecture very unfavorable? What could be the reason? I would rule out the training data as a reason, as the training is successful if I train separate single-output ANNs.
Thank you and best regards.

Antworten (1)

Venu
Venu am 8 Jan. 2024
In your case, I suspect factors and attributes of your custom layer 'ProjectAndReshapeLayer'. You can check with layer initialization aspect, consider applying regularization, check how the projection matrix is learned, verify that the reshaping operation is appropriate for the specific output type. It's important to verify that this layer is not inadvertently causing the network to learn average values rather than distinct representations for each output.
Try adding another FC layer at the end of layers2 to increase the complexity. This additional complexity can potentially help the network capture more nuanced representations for the first output, especially if the previous layers might not have been capturing the necessary complexity.
  2 Kommentare
Udit06
Udit06 am 9 Jan. 2024
I would like to add one more point to the above answer. In a multi-output scenario, the total loss for the network is often a combination of the individual losses for each output. The model's training objective is to minimize this total loss. However, if one loss dominates the total loss, the network may focus on optimizing for that particular output at the expense of the others, leading to poor performance on the less weighted tasks. To handle this, you can assign weights to each loss component to balance their contributions to the total loss.
I hope this helps.
Clemens H.
Clemens H. am 9 Jan. 2024
First of all, thank you very much for your answers! Unfortunately, the problem persists and I would like to address a few points:
1) I did not get my described problem with single-output NNs, so in each case the outputs could be determined separately just fine. I therefore thought that the respective structure of the two NN "arms" was basically ok. But something is wrong?
2) @Venu I took the "ProjectAndReshapeLayer" from the Matlab help on the topic "Train Generative Adversarial Network (GAN)", open the corresponding live script: https://de.mathworks.com/help/deeplearning/ug/train-generative-adversarial-network.html
In the case that I only wanted to output the matrix (single-output variant), it worked. In this respect, I don't know what I should fundamentally change now.
3) @Venu unfortunately I can't add a FC at the end of "layers2", because otherwise the existing "autoflatten" of the FC would no longer result in a matrix as output, then an error would occur with regard to the output dimensions. Further FCs in other places have unfortunately not helped so far.
4) @Udit06 Thanks also for this hint. If I consider the limiting cases that I only consider 100% Output1 or alternatively 100% Output2, even this does not change the result: The NN output is still constant for all samples as described in the question. How can that be?

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Sequence and Numeric Feature Data Workflows finden Sie in Help Center und File Exchange

Produkte


Version

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by