Main Content

Audio Processing Using Deep Learning

Extend deep learning workflows with audio and speech processing applications

Apply deep learning to audio and speech processing applications by using Deep Learning Toolbox™ together with Audio Toolbox™. For signal processing applications, see Signal Processing Using Deep Learning. For applications in wireless communications, see Wireless Communications Using Deep Learning.


Audio LabelerDefine and visualize ground-truth labels


expand all

audioDatastoreDatastore for collection of audio files
audioDataAugmenterAugment audio data
audioFeatureExtractorStreamline audio feature extraction
ivectorSystemCreate i-vector system
openl3FeaturesExtract OpenL3 features
pitchnnEstimate pitch with deep learning neural network
vggishFeaturesExtract VGGish features
classifySoundClassify sounds in audio signal
crepeCREPE neural network
crepePreprocessPreprocess audio for CREPE deep learning network
crepePostprocessPostprocess output of CREPE deep learning network
openl3OpenL3 neural network
openl3FeaturesExtract OpenL3 features
openl3PreprocessPreprocess audio for OpenL3 feature extraction
pitchnnEstimate pitch with deep learning neural network
vggishVGGish neural network
vggishFeaturesExtract VGGish features
vggishPreprocessPreprocess audio for VGGish feature extraction
yamnetYAMNet neural network
yamnetGraphGraph of YAMNet AudioSet ontology
yamnetPreprocessPreprocess audio for YAMNet classification


Introduction to Deep Learning for Audio Applications (Audio Toolbox)

Learn common tools and workflows to apply deep learning to audio applications.

Classify Sound Using Deep Learning (Audio Toolbox)

Train, validate, and test a simple long short-term memory (LSTM) to classify sounds.

Transfer Learning with Pretrained Audio Networks

Use transfer learning to retrain YAMNet, a pretrained convolutional neural network (CNN), to classify a new set of audio signals.

Speaker Identification Using Custom SincNet Layer and Deep Learning

Perform speech recognition using a custom deep learning layer that implements a mel-scale filter bank.

Dereverberate Speech Using Deep Learning Networks

Train a deep learning model that removes reverberation from speech.

Speech Command Recognition in Simulink

Detect the presence of speech commands in audio using a Simulink® model.

Spoken Digit Recognition with Wavelet Scattering and Deep Learning

This example shows how to classify spoken digits using both machine and deep learning techniques.

Cocktail Party Source Separation Using Deep Learning Networks

This example shows how to isolate a speech signal using a deep learning network.

Sequential Feature Selection for Audio Features

This example shows a typical workflow for feature selection applied to the task of spoken digit recognition.

Learn Pre-Emphasis Filter Using Deep Learning

Use a convolutional deep network to learn a pre-emphasis filter for speech recognition.

Featured Examples