Extract cepstral features from audio segment
object™ extracts cepstral features from an audio segment. Cepstral features are commonly
used to characterize speech and music signals.
To extract cepstral features:
cepstralFeatureExtractor object and set its properties.
Call the object with arguments, as if it were a function.
To learn more about how System objects work, see What Are System Objects? (MATLAB).
cepFeatures = cepstralFeatureExtractor
cepFeatures = cepstralFeatureExtractor(Name,Value)
cepFeatures = cepstralFeatureExtractor creates a System
cepFeatures, that calculates cepstral features
independently across each input channel. Columns of the input are treated as individual
cepFeatures = cepstralFeatureExtractor(
sets each property
Name to the specified
Unspecified properties have default values.
cepFeatures = cepstralFeatureExtractor('InputDomain','Frequency','SampleRate',fs,'LogEnergy','Replace')accepts a signal in the frequency domain, sampled at
fsHz. The first element of the coefficients vector is replaced by the log energy value.
Unless otherwise indicated, properties are nontunable, which means you cannot change their
values after calling the object. Objects lock when you call them, and the
release function unlocks them.
If a property is tunable, you can change its value at any time.
For more information on changing property values, see System Design in MATLAB Using System Objects (MATLAB).
InputDomain— Domain of input signal
Domain of the input signal, specified as either
NumCoeffs— Number of coefficients to return
13(default) | positive integer
Number of coefficients to return, specified as an integer in the range [2, v], where v is the number of valid passbands.
The number of valid passbands is defined as
floor(fs/2))-2. A passband is valid if its edges fall below
BandEdges –– Vector containing the band edges of the filter
bank, specified through the
fs –– Sample rate of the input audio signal, specified
FFTLength— FFT length
(default) | positive integer
FFT length, specified as a positive integer. The default,
means that the FFT length is equal to the number of rows in the input signal.
To enable this property, set
LogEnergy— Specify how the log energy is shown
Specify how the log energy is shown in the coefficients vector output, specified as:
'Append' –– The object prepends the log energy to the
coefficients vector. The length of the coefficients vector is 1 +
'Replace' –– The object replaces the first coefficient with
the log energy of the signal. The length of the coefficients vector is
'Ignore' –– The object does not calculate or return the log
SampleRate— Input sample rate (Hz)
16000(default) | positive scalar
Input sample rate in Hz, specified as a real positive scalar.
BandEdges— Band edges of auditory filter bank (Hz)
Band edges of the filter bank in Hz, specified as a nonnegative monotonically increasing row vector in the range [0, ∞). The maximum bandedge frequency can be any finite number. The number of bandedges must be in the range [4, 80].
The default band edges are spaced linearly for the first ten and then logarithmically after. The default band edges are set as recommended by .
FilterBankDesignDomain— Domain for filter bank design
Domain for filter bank design, specified as either
'Bin'. The filterbank is designed as overlapped triangles with band
edges specified by the
BandEdges property is specified in Hz. When you set the
design domain to:
'Hz' –– Filter bank triangles are drawn in Hz and are mapped
Here is an example that plots the filter bank in bins when the
FilterBankDesignDomain is set to
[audioFile, fs] = audioread('NoisySpeech-16-22p5-mono-5secs.wav'); duration = round(0.02*fs); % 20 ms audio segment audioSegment = audioFile(5500:5500+duration-1); cepFeatures = cepstralFeatureExtractor('SampleRate',fs)
cepFeatures = cepstralFeatureExtractor with properties: Properties InputDomain: 'Time' NumCoeffs: 13 FFTLength:  LogEnergy: 'Append' SampleRate: 22500 Advanced Properties BandEdges: [1×42 double] FilterBankDesignDomain: 'Hz' FilterBankNormalization: 'Bandwidth'
[coeffs,delta,deltaDelta] = cepFeatures(audioSegment);
getFiltersfunction, get the filter bank. Plot the filter bank.
[filterbank, freq] = getFilters(cepFeatures); plot(freq(1:150),filterbank(1:150,:))
For details, see .
'Bin' –– The bandedge frequencies in
are converted to bins. The filter bank triangles are drawn symmetrically in
FilterBankDesignDomain property to
release(cepFeatures); cepFeatures.FilterBankDesignDomain = 'Bin'; [coeffs,delta,deltaDelta] = cepFeatures(audioSegment); [filterbank, freq] = getFilters(cepFeatures); plot(freq(1:150),filterbank(1:150,:))
For details, see .
FilterBankNormalization— Normalize filter bank
Normalization technique used on the weights of the filter bank, specified as:
'Bandwidth' –– The weights of each bandpass filter are
normalized by the corresponding bandwidth of the filter.
'Area' –– The weights of each bandpass filter are
normalized by the corresponding area of the bandpass filter.
'None' –– The weights of the filter are not
[coeffs,delta,deltaDelta] = cepFeatures(audioIn)
The log energy value prepends the coefficient vector or replaces the first element of
the coefficients vector based on whether you set the
'Replace'. For details, see
audioIn— Audio input to cepstral feature extractor
Audio input to the cepstral feature extractor, specified as a column vector or a matrix. If specified as a matrix, the columns are treated as independent audio channels.
coeffs— Cepstral coefficients
Cepstral coefficients, returned as a column vector or a matrix. If the
coefficients matrix is an N-by-M matrix,
N is determined by the values you specify in
M equals the number of input audio channels.
LogEnergy property is set to:
'Append' –– The object prepends the log energy value to
the coefficients vector. The length of the coefficients vector is 1 +
NumCoeffs. This is the default setting of the
'Replace' –– The object replaces the first coefficient
with the log energy of the signal. The length of the coefficients vector is
'Ignore' –– The object does not calculate or return the
delta— Change in coefficients
Change in coefficients over consecutive calls to the algorithm, returned as a
vector or a matrix. The
delta array is of the same size and data
type as the
In this example,
cepFeatures is the cepstral feature extractor
that accepts audio input signal sampled at 12 kHz. Stream in three segments of audio
signal on three consecutive calls to the object algorithm. Return the cepstral
coefficients of the filter bank and the corresponding
cepFeatures = cepstralFeatureExtractor('SampleRate',12000); [coeff1,delta1] = cepFeatures(audioIn); [coeff2,delta2] = cepFeatures(audioIn); [coeff3,delta3] = cepFeatures(audioIn);
delta2 is computed as
delta3 is computed as
The initial array,
delta1, is an array of zeros.
deltaDelta— Change in delta values
delta values over consecutive calls to the
algorithm, returned as a vector or a matrix. The
is the same size and data type as the
In this example, consecutive calls to the cepstral feature extractor algorithm
deltaDelta values in addition to the coefficients and
cepFeatures = cepstralFeatureExtractor('SampleRate',12000); [coeff1,delta1,deltaDelta1] = cepFeatures(audioIn); [coeff2,delta2,deltaDelta2] = cepFeatures(audioIn); [coeff3,delta3,deltaDelta3] = cepFeatures(audioIn);
deltaDelta2 is computed as
deltaDelta3 is computed
delta3-delta2. The initial array,
deltaDelta1, is an array of zeros.
To use an object function, specify the
object as the first input argument. For
example, to release system resources of a System
Extract the mel frequency cepstral coefficients and the log energy values of segments in a speech file. Return
delta, the difference between current and the previous cepstral coefficients, and
deltaDelta, the difference between the current and the previous
delta values. The log energy value the object computes can prepend the coefficients vector or replace the first element of the coefficients vector. This is done based on whether you set the
LogEnergy property of the
cepstralFeatureExtractor object to
Read an audio signal from
'SpeechDFT-16-8-mono-5secs.wav' file. Extract a 40 ms segment from the audio data. Create a
cepstralFeatureExtractor object. The cepstral coefficients computed by the default object are the mel frequency coefficients. In addition, the object computes the log energy, delta, and delta-delta values of the audio segment.
[audioFile, fs] = audioread('SpeechDFT-16-8-mono-5secs.wav'); duration = round(0.04*fs); % 40 ms audioSegment = audioFile(5500:5500+duration-1); cepFeatures = cepstralFeatureExtractor('SampleRate',fs)
cepFeatures = cepstralFeatureExtractor with properties: Properties InputDomain: 'Time' NumCoeffs: 13 FFTLength:  LogEnergy: 'Append' SampleRate: 8000 Show all properties
LogEnergy property is set to
'Append'. The first element in the coefficients vector is the log energy value and the remaining elements are the 13 cepstral coefficients computed by the object. The number of cepstral coefficients is determined by the value you specify in the
[coeffs,delta,deltaDelta] = cepFeatures(audioSegment)
coeffs = 14×1 3.8281 -10.7136 2.5113 0.8357 2.0019 1.0714 -0.7524 0.6335 -0.3084 -0.2283 ⋮
delta = 14×1 0 0 0 0 0 0 0 0 0 0 ⋮
deltaDelta = 14×1 0 0 0 0 0 0 0 0 0 0 ⋮
The initial values for the
deltaDelta arrays are always zero. Consider another 40 ms audio segment in the file and extract the cepstral features from this segment.
audioSegmentTwo = audioFile(5820:5820+duration-1); [coeffsTwo,deltaTwo,deltaDeltaTwo] = cepFeatures(audioSegmentTwo)
coeffsTwo = 14×1 3.0899 -11.8236 1.3105 2.3195 1.6894 -0.0264 0.6509 0.8009 -0.5502 0.3022 ⋮
deltaTwo = 14×1 -0.7382 -1.1100 -1.2008 1.4838 -0.3125 -1.0978 1.4033 0.1674 -0.2418 0.5306 ⋮
deltaDeltaTwo = 14×1 -0.7382 -1.1100 -1.2008 1.4838 -0.3125 -1.0978 1.4033 0.1674 -0.2418 0.5306 ⋮
Verify that the difference between
coeffs vectors equals
ans = logical 1
Verify that the difference between
delta vectors equals
ans = logical 1
Many feature extraction techniques operate on the frequency domain. Converting an audio signal to the frequency domain only once is efficient. In this example, you convert a streaming audio signal to the frequency domain and feed that signal into a voice activity detector. If speech is present, mel-frequency cepstral coefficients (MFCC) features are extracted from the frequency-domain signal using the
cepstralFeatureExtractor System object™.
dsp.AudioFileReader System object to read from an audio file.
fileReader = dsp.AudioFileReader('Counting-16-44p1-mono-15secs.wav'); fs = fileReader.SampleRate;
Process the audio in 30 ms frames with a 10 ms hop. Create a default
dsp.AsyncBuffer object to manage overlap between audio frames.
samplesPerFrame = ceil(0.03*fs); samplesPerHop = ceil(0.01*fs); samplesPerOverlap = samplesPerFrame - samplesPerHop; fileReader.SamplesPerFrame = samplesPerHop; buffer = dsp.AsyncBuffer;
voiceActivityDetector System object and a
cepstralFeatureExtractor System object. Specify that they operate in the frequency domain. Create a
dsp.SignalSink to log the extracted cepstral features.
VAD = voiceActivityDetector('InputDomain','Frequency'); cepFeatures = cepstralFeatureExtractor('InputDomain','Frequency','SampleRate',fs,'LogEnergy','Replace'); sink = dsp.SignalSink;
In an audio stream loop:
Read one hop's of samples from the audio file and save the samples into the buffer.
Read a frame from the
buffer with specified overlap from the previous frame.
Call the voice activity detector to get the probability of speech for the frame under analysis.
If the frame under analysis has a probability of speech greater than 0.75, extract cepstral features and log the features using the signal sink. If the frame under analysis has a probability of speech less than 0.75, write a vector of NaNs to the sink.
threshold = 0.75; nanVector = nan(1,13); while ~isDone(fileReader) audioIn = fileReader(); write(buffer,audioIn); overlappedAudio = read(buffer,samplesPerFrame,samplesPerOverlap); X = fft(overlappedAudio,2048); probabilityOfSpeech = VAD(X); if probabilityOfSpeech > threshold xFeatures = cepFeatures(X); sink(xFeatures') else sink(nanVector) end end
Visualize the cepstral coefficients over time.
timeVector = linspace(0,15,size(sink.Buffer,1)); plot(timeVector,sink.Buffer) xlabel('Time (s)') ylabel('MFCC Amplitude') legend('Log-Energy','c1','c2','c3','c4','c5','c6','c7','c8','c9','c10','c11','c12')
Auditory cepstrum coefficients are popular features extracted from speech signals for use in recognition tasks. In the source-filter model of speech, cepstral coefficients are understood to represent the filter (vocal tract). The vocal tract frequency response is relatively smooth, whereas the source of voiced speech can be modeled as an impulse train. As a result, the vocal tract can be estimated by the spectral envelope of a speech segment.
The motivating idea of cepstral coefficients is to compress information about the vocal tract (smoothed spectrum) into a small number of coefficients based on an understanding of the cochlea. Although there is no hard standard for calculating the coefficients, the basic steps are outlined by the diagram.
The windowing is done by a Hamming function. The default filter bank linearly spaces the first 10 triangular filters and logarithmically spaces the remaining filters.
If the input (x) is a time-domain signal, the log energy is computed using the following equation:
If the input (x) is a frequency-domain signal, the log energy is computed using the following equation:
 Auditory Toolbox. https://engineering.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf
 ETSI ES 201 108 V1.1.3 (2003-09). https://www.etsi.org/deliver/etsi_es/201100_201199/201108/01.01.03_60/es_201108v010103p.pdf
Usage notes and limitations:
System Objects in MATLAB Code Generation (MATLAB Coder)