Low LSTM Accuracy in Speech Recognition

Question

Hamza am 31 Okt. 2023

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2041026-low-lstm-accuracy-in-speech-recognition

Kommentiert: Christopher McCausland am 6 Nov. 2023

Hello everyone, I am applying LSTM to speech emotion recognition. I have performed feature extraction using MFCC, resulting in a matrix of dimensions 60,575 × 39. I subsequently transformed this matrix into a cell array named "AllCellTrain" with dimensions 280 × 1, containing signals of varying sizes, as illustrated in the image below. I then utilized "AllCellTrain" as input for the trainNetwork function, along with the labels YCA, network layers, and training options. However, I encountered a significant issue with accuracy, achieving only around 20%. I'm unsure where I may have made a mistake. Could someone please offer some assistance?

 num_hidden_units = 1024;
layers = [
    sequenceInputLayer(num_features)
    lstmLayer(num_hidden_units, 'OutputMode', 'last')
    fullyConnectedLayer(num_classes)
    softmaxLayer
    classificationLayer];
% Specify the training options
    max_epochs = 36;
    mini_batch_size = 28;
    initial_learning_rate = 0.001;
options = trainingOptions('adam', ...
    'MaxEpochs', max_epochs, ...
    'MiniBatchSize', mini_batch_size, ...
    'InitialLearnRate', initial_learning_rate, ...
    'SequenceLength','shortest', ...
    'Shuffle','every-epoch',...
    'ExecutionEnvironment','gpu', ...
    'Verbose', false, ...
    'Plots','training-progress');
net = trainNetwork(AllCellTrain, YCA, layers, options);
predicted_labels = classify(net, AllCellTest,'ExecutionEnvironment','gpu');
acc = mean(predicted_labels == YCT)

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Hamza am 6 Nov. 2023

Bearbeitet: Hamza am 6 Nov. 2023

Hi @Christopher McCausland , thanks for your answer, I ma trying to classify 7 emotion classes, for your information I have used the same data on 1D CNN and got 90% accuracy, didnt know the issue on LSTM, also when I shufflued the colunms "the features" I got diffrent result, which souldnt be the case. you find the attached curve! thanks in advance

Christopher McCausland am 6 Nov. 2023

Hi @Hamza,

To me this looks like classic overfitting, your model appears to train well and learn features, however these features are overfitted to the training data, and are not representative of genralised data.

A few things to consider;

Do you have multiple speakers? If so, how do you pick which speakers are in the test/train set.
You have 280 input sequences, and seven classes, if the data is perfectly ballanced you have 40 observations per class, is this enough?
Can you include a validation split to prevent overfitting?
These are just a few ways to prevent overfitting/ ensure your data is appropreate for training, there are many other which I would suggest you take a look at.

In terms of the CNN preformance, were the test/train set the same and how many epochs did you train the CNN for?

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Low LSTM Accuracy in Speech Recognition

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Low LSTM Accuracy in Speech Recognition

4 Kommentare 2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden