Neural networks stuttering detection using MFCC features

4 Ansichten (letzte 30 Tage)
Rahaf Alenezi
Rahaf Alenezi am 6 Apr. 2019
Beantwortet: jibrahim am 22 Apr. 2019
Hello,
I am working on a project were I need to detect stuttering sounds in speech using a pattern recognition neural network. The network is working fine detecting non stuttering sounds. However, I have 777 samples of stuttered sounds and I trained them using 13 mfcc features and I'm still not detecting stuttering.
I think I might have been extracting features wrong but I followed the documentations of using mfcc features.
The neural network result shows me that it's not working very well on stuttering detection. This is my code for feature extraction and I'm using pattern net from nnstart. Could you please tell me what I'm I doing wrong? Thank you in advance.
files = dir('data1s/*.wav');
unstutteredFeatures = [];
stutteredFeatures = [];
targetVector = zeros(2, numel(files));
for i = 1 : numel(files)
fileName = strcat('data1s/',files(i).name);
newSample = [];
[x,fs] = audioread(fileName);
[y] = mfcc(x,fs,'LogEnergy','Replace');
if fileName(8) == '0' %data were previously labled
newSample = vertcat(newSample, mean(y));
unstutteredFeatures = vertcat(unstutteredFeatures, newSample);
targetVector(1, i) = 1;
else
newSample = vertcat(newSample, mean(y));
stutteredFeatures = vertcat(stutteredFeatures, newSample);
targetVector(2, i) = 1;
end
end
inputMatrix = vertcat(unstutteredFeatures, stutteredFeatures);
targetVector=targetVector';

Antworten (1)

jibrahim
jibrahim am 22 Apr. 2019
Hi Rahaf,
The mfcc function will return an L-by-13 matrix, where L is the number of frames the audio signal is partitioned into, and 13 is the number of coefficients. In your code, you are taking the mean of the mfcc output, so you end up with a 1-by-13 vector for each file, regadless of how long your file is. This might not be sufficient data to make a decision on whether stuttering occured.
You problem setup is somewhat similar to a gender classification example in the product:
The example uses gtcc, but you can easily use mfcc instead (and drop the other features from the example if they do not help). Rather than reducing coefficients to their means, the entire sequence of coefficients is fed to a deep learning network.
HTH,
Jihad

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by