Unable to perform assignment because the size of the left side is 100-by-198 and the size of the right side is 100-by-98. Error in backgroundSpectrograms (line 50) Xbkg(:,:,:,ind) = filterBank * spec;
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
I try to do the background spectograms its the same records as in https://www.mathworks.com/help/deeplearning/examples/deep-learning-speech-recognition.html
and it gives me that error :
Warning:
The FFT length is too small to compute the specified number of
bands. Decrease the number of bands or increase the FFT length.
> In designAuditoryFilterBank (line 104)
In backgroundSpectrograms (line 20)
nable to perform assignment because the size of the left side is
100-by-198 and the size of the right side is 100-by-98.
Error in backgroundSpectrograms (line 50)
Xbkg(:,:,:,ind) = filterBank * spec;
I dont know how to fix it its the backgrounds its the same in example so I dont know what is the error about.
Help me to fix it :
ads = 1x1 audioDatastore
numBkgClips = 4000
volumeRange = [1e-4,1]
segmentDuration= 2
hopDuration = 0.010
numBands = 100
frameDuration = 0.025
FFT length = 512 for backgroundSpectograms
help me with the values
if I set FFT length to 1000 the warning out but the error stay
I must give the hopDuration, numBands,frameDuration, segmentDuration values like this because of my own wav files .
When I try do
adsBkg = subset(ads0,ads0.Labels=="_background_noise_");
numBkgClips = 4000;
volumeRange = [1e-4,1];
XBkg = backgroundSpectrograms(adsBkg,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands);
XBkg = log10(XBkg + epsil);
it gives me above error.
backgroundSpectogram.m
% backgroundSpectrograms(ads,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands)
% calculates numBkgClips spectrograms of background clips taken from the
% audio files in the |ads| datastore. Approximately the same number of
% clips is taken from each audio file. Before calculating spectrograms, the
% function rescales each audio clip with a factor sampled from a
% log-uniform distribution in the range given by volumeRange.
% segmentDuration is the total duration of the speech clips (in seconds),
% frameDuration the duration of each spectrogram frame, hopDuration the
% time shift between each spectrogram frame, and numBands the number of
% frequency bands.
function Xbkg = backgroundSpectrograms(ads,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands)
disp("Computing background spectrograms...");
fs = 16e3;
FFTLength = 512;
persistent filterBank
if isempty(filterBank)
filterBank = designAuditoryFilterBank(fs,'FrequencyScale','bark',...
'FFTLength',FFTLength,...
'NumBands',numBands,...
'FrequencyRange',[50,7000]);
end
logVolumeRange = log10(volumeRange);
numBkgFiles = numel(ads.Files);
numClipsPerFile = histcounts(1:numBkgClips,linspace(1,numBkgClips,numBkgFiles+1));
numHops = segmentDuration/hopDuration - 2;
Xbkg = zeros(numBands,numHops,1,numBkgClips,'single');
ind = 1;
for count = 1:numBkgFiles
wave = read(ads);
frameLength = frameDuration*fs;
hopLength = hopDuration*fs;
for j = 1:numClipsPerFile(count)
indStart = randi(numel(wave)-fs);
logVolume = logVolumeRange(1) + diff(logVolumeRange)*rand;
volume = 10^logVolume;
x = wave(indStart:indStart+fs-1)*volume;
x = max(min(x,1),-1);
[~,~,~,spec] = spectrogram(x,hann(frameLength,'periodic'),frameLength - hopLength,FFTLength,'onesided');
Xbkg(:,:,:,ind) = filterBank * spec;
if mod(ind,1000)==0
disp("Processed " + string(ind) + " background clips out of " + string(numBkgClips))
end
ind = ind + 1;
end
end
disp("...done");
end
2 Kommentare
imtiaz waheed
am 6 Feb. 2020
numBkgClips = 4000;
volumeRange = [1e-4,1];
segmentDuration= 2;
hopDuration = 0.010;
numBands = 100;
frameDuration = 0.025;
FFTlength = 1024;
adsBkg = subset(ads,ads.Labels=='_background_noise_');
% ads is your datastore
XBkg = backgroundSpectrograms(adsBkg,numBkgClips);volumeRange;segmentDuration;frameDuration;hopDuration;numBands;FFTlength;
disp('Computing background spectrograms...');
logVolumeRange = log10(volumeRange);
numBkgFiles = numel(ads.Files);
numClipsPerFile = histcounts(1:numBkgClips,linspace(1,numBkgClips,numBkgFiles+1));
numHops = segmentDuration/hopDuration - 2;
Xbkg = zeros(numBands,numHops,1,numBkgClips,'single');
ind = 1;
for count = 1:numBkgFiles
[wave,info] = read(ads);
fs = info.SampleRate;
frameLength = frameDuration*fs;
hopLength = hopDuration*fs;
for j = 1:numClipsPerFile(count)
indStart = randi(numel(wave)-fs);
logVolume = logVolumeRange(1) + diff(logVolumeRange)*rand;
volume = 10^logVolume;
x = wave(indStart:indStart+fs-1)*volume;
x = max(min(x,1),-1);
Xbkg(:,:,:,ind) = melSpectrogram(x,fs, ...
'WindowLength',frameLength, ...
'OverlapLength',frameLength - hopLength, ...
'FFTLength',512, ...
'NumBands',numBands, ...
'FrequencyRange',[50,7000]);
if mod(ind,1000)==0
disp('Processed ' + string(ind) + ' background clips out of ' + string(numBkgClips))
end
ind = ind + 1;
end
end
disp('...done');
Antworten (2)
jibrahim
am 7 Jan. 2020
Hi Barb,
There are two problems:
1) Since you asked for 100 bands in the auditory filter ban, the hard-coded FFT length (512) is too small. 1024 should work.
2) the code hard-codes the expected segment duration to 1 second (by using fs here: x = wave(indStart:indStart+fs-1)*volume;)
I modified and attached the code. This should run now:
numBkgClips = 4000;
volumeRange = [1e-4,1];
segmentDuration= 2;
hopDuration = 0.010;
numBands = 100;
frameDuration = 0.025;
FFTlength = 1024;
adsBkg = subset(ads,ads.Labels=="_background_noise_");
% ads is your datastore
XBkg = backgroundSpectrograms(adsBkg,numBkgClips,volumeRange,segmentDuration,frameDuration,hopDuration,numBands,FFTlength);
5 Kommentare
jibrahim
am 16 Jan. 2020
Make sure that the argument to the fullyConnectedLayer that precedes the softMaxLayer is equal to the number of classes you are trying to classify. It seems like you have 4 classes, but you using fullyConnectedLayer(3). If you indeed have 3 classes, then maybe the categorical validation array you are supplying has an unused cateogry. You can remove it using removecats:
YValidation = removecats(YValidation);
Barb
am 22 Jan. 2020
1 Kommentar
jibrahim
am 23 Jan. 2020
Make sure the size of the image going into your network matches the image size you used in training:
[YPredicted,probs] = classify(trainedNet,spec,'ExecutionEnvironment','cpu');
It looks like the size of spec is not [100 98 1].
I remember you were generating spectrograms based on 2-second segments. Make sure waveBuffer holds indeed 2 seconds. I think the originsl demo uses one second, so you might have to slightly change those three lines of code:
x = audioIn();
waveBuffer(1:end-numel(x)) = waveBuffer(numel(x)+1:end);
waveBuffer(end-numel(x)+1:end) = x;
Siehe auch
Kategorien
Mehr zu AI for Audio finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!