i want to use LSTM based audio network to work with Live audio

3 Ansichten (letzte 30 Tage)
Arslan Munim
Arslan Munim am 27 Jul. 2022
Kommentiert: Arslan Munim am 28 Sep. 2022
Hello Matlab team,
I am using this example to work with my audio data set https://www.mathworks.com/matlabcentral/fileexchange/74611-fault-detection-using-deep-learning-classification#examples_tab dataset is trained but I want to make the application live with PC, forexample I have a mic and make an application to use my trained model to predict the output.
Can you guide me or help me with that?
Regards,
Arslan Munaim

Antworten (2)

jibrahim
jibrahim am 27 Jul. 2022
Hi Arslan,
There is a function in that repo (streamingClassifier) that should get the job done in conjunction with an audio device reader:
% Create a microphone object
adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
% These statistic value should come from your training...
M = 0;
S = 1;
while 1
% Read a frame of data from microphone
frame = adr();
% Pass to network
scores = streamingClassifier(frame,M,S);
% Use the scores any way you want
end
  5 Kommentare
jibrahim
jibrahim am 2 Aug. 2022
Hi Arslan,
Since you trained the network with a sample rate of 16e3, you will have to perform sample-rate conversion from 44100 kHz to 16 kHz. This code is a possible implementation, where you essentially feed the network frames of length 512 sampled at 16 kHz, just like the original code:
% Create a microphone object
%adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
src = dsp.SampleRateConverter(InputSampleRate=44100,OutputSampleRate=16e3,...
Bandwidth=15800);
[~,D] = src.getRateChangeFactors;
% The frame size must be a multiple of 441 (the decimation factor of the
% sample rate converter)
L = floor(22000/D);
frameLength = L*D; % get as close to desired frame size
adr = audioDeviceReader(SampleRate=44100,SamplesPerFrame=frameLength);
buff = dsp.AsyncBuffer;
% These statistic values should come from your training...
M = 0;
S = 1;
while 1
% Read a frame of data from microphone
frame = adr();
% Convert to 16 KHz
frame = src(frame);
% Save to buffer
write(buff,frame)
while buff.NumUnreadSamples >= 512
frame = read(buff,512);
% Pass to network
scores = streamingClassifier(frame,M,S);
% Use the scores any way you want
end
end
Note that you can also potentially feed the network longer frames. That should also work, and is probably more efficient, as the network will run faster if you give it a long input (as opposed to multiple short ones):
% Create a microphone object
%adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
src = dsp.SampleRateConverter(InputSampleRate=44100,OutputSampleRate=16e3,Bandwidth=15800);
[~,D] = src.getRateChangeFactors;
% The frame size must be a multiple of 441 (the decimation factor of the
% sample rate converter)
L = floor(22000/D);
frameLength = L*D;
adr = audioDeviceReader(SampleRate=44100,SamplesPerFrame=frameLength);
buff = dsp.AsyncBuffer;
% These statistic values should come from your training...
M = 0;
S = 1;
while 1
% Read a frame of data from microphone
frame = adr();
% Convert to 16 KHz
frame = src(frame);
% Save to buffer
write(buff,frame)
N = buff.NumUnreadSamples;
L = floor(N/512);
if L>0
frame = read(buff,512*L);
% Pass to network
scores = streamingClassifier(frame,M,S);
% Use the scores any way you want
end
end
If you can't change the frame size on the microphone, then you can handle that using another buffer, for example:
% Create a microphone object
%adr = audioDeviceReader(SampleRate=16e3,SamplesPerFrame=512);
src = dsp.SampleRateConverter(InputSampleRate=44100,OutputSampleRate=16e3,Bandwidth=15800);
[~,D] = src.getRateChangeFactors;
% The frame size must be a multiple of 441 (the decimation factor of the
% sample rate converter)
L = floor(22000/D);
frameLength = L*D;
adr = audioDeviceReader(SampleRate=44100,SamplesPerFrame=22000);
buffSRC = dsp.AsyncBuffer;
buff = dsp.AsyncBuffer;
% These statistic values should come from your training...
M = 0;
S = 1;
while 1
% Read a frame of data from microphone
frame = adr();
write(buffSRC,frame);
frame = read(buffSRC,frameLength);
% Convert to 16 KHz
frame = src(frame);
% Save to buffer
write(buff,frame)
N = buff.NumUnreadSamples;
L = floor(N/512);
if L>0
frame = read(buff,512*L);
% Pass to network
scores = streamingClassifier(frame,M,S);
% Use the scores any way you want
end
end
Arslan Munim
Arslan Munim am 9 Aug. 2022
Thankyou for your support, it was very helpful.
Now I want to use multiple mics for prediction can you please give me some idea how i can use streaming classifier with 3 or 4 mics of the predicition.
Thanks and have a nice day.
Regards,
Arslan

Melden Sie sich an, um zu kommentieren.


jibrahim
jibrahim am 9 Aug. 2022
Hi Arslan,
audioDeviceReader supports multi-mic devices. Use the ChannelMappingSource and ChannelMapping properties to map between device input channels and the output data.
This network was trained on mono data, so, to adapt it to multi-channel data, you either have to retrain your network for multi-channel data, or somehow combine your input channels into one channel (by a weighted sum, or selecting a particular channel, etc) and proceed like above.
  23 Kommentare
jibrahim
jibrahim am 20 Aug. 2022
OK, this helps. You will need other hardware (one device, multiple mics) for the system to recognize it. You could also give the UDP idea a shot, see how viable that is.
Arslan Munim
Arslan Munim am 28 Sep. 2022
Hi again,
I am trying to train my network, with lowering BitsPerSample to 8 before it was 16 BitsPerSample. Every time i try to start training model it throw warning (given below) and terminates.
I try it with different sample rate but it gives same error everytime. I tried to change my layer structure, changing InitialLearnRate',0.001 but still i am getting same warning.
Warning: Training stopped at iteration 1 because training loss is NaN. Predictions using the output network might contain NaN values.
Model:
layers = [ ...
sequenceInputLayer(size(trainingFeatures{1},1))
lstmLayer(100,"OutputMode","sequence")
dropoutLayer(0.1)
lstmLayer(100,"OutputMode","last")
fullyConnectedLayer(5)
softmaxLayer
classificationLayer];
miniBatchSize = 30;
validationFrequency = floor(numel(trainingFeatures)/miniBatchSize);
options = trainingOptions("adam", ...
"MaxEpochs",100, ...
"MiniBatchSize",miniBatchSize, ...
"Plots","training-progress", ...
"Verbose",false, ...
"Shuffle","every-epoch", ...
"LearnRateSchedule","piecewise", ...
"LearnRateDropFactor",0.1, ...
"LearnRateDropPeriod",20,...
'InitialLearnRate',0.001,...
'ValidationData',{validationFeatures,adsValidation.Labels}, ...
'ValidationFrequency',validationFrequency);
Regards,
Arslan

Melden Sie sich an, um zu kommentieren.

Produkte


Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by