Documentation

### This is machine translation

Translated by
Mouseover text to see original. Click the button below to return to the English version of the page.

# cepstralFeatureExtractor

Extract cepstral features from audio segment

## Description

The `cepstralFeatureExtractor` System object™ extracts cepstral features from an audio segment. Cepstral features are commonly used to characterize speech and music signals.

To extract cepstral features:

1. Create the `cepstralFeatureExtractor` object and set its properties.

2. Call the object with arguments, as if it were a function.

## Creation

### Syntax

``cepFeatures = cepstralFeatureExtractor``
``cepFeatures = cepstralFeatureExtractor(Name,Value)``

### Description

example

````cepFeatures = cepstralFeatureExtractor` creates a System object, `cepFeatures`, that calculates cepstral features independently across each input channel. Columns of the input are treated as individual channels.```

example

````cepFeatures = cepstralFeatureExtractor(Name,Value)` sets each property `Name` to the specified `Value`. Unspecified properties have default values.Example: ```cepFeatures = cepstralFeatureExtractor('InputDomain','Frequency','SampleRate',fs,'LogEnergy','Replace')``` accepts a signal in the frequency domain, sampled at `fs` Hz. The first element of the coefficients vector is replaced by the log energy value.```

## Properties

expand all

Unless otherwise indicated, properties are nontunable, which means you cannot change their values after calling the object. Objects lock when you call them, and the `release` function unlocks them.

If a property is tunable, you can change its value at any time.

Type of filter bank, specified as either `'Mel'` or `'Gammatone'`. When `FilterBank` is set to `Mel`, the object computes the mel frequency cepstral coefficients (MFCC). When `FilterBank` is set to `Gammatone`, the object computes the gammatone cepstral coefficients (GTCC).

Data Types: `char` | `string`

Domain of the input signal, specified as either `'Time'` or `'Frequency'`.

Data Types: `char` | `string`

Number of coefficients to return, specified as an integer in the range [2, v], where v is the number of valid passbands. The number of valid passbands depends on the type of filter bank:

• `Mel` –– The number of valid passbands is defined as ```sum(BandEdges <= floor(SampleRate/2))-2```.

• `Gammatone` –– The number of valid passbands is defined as `ceil(hz2erb(FrequencyRange(2))-hz2erb(FrequencyRange(1)))`.

Data Types: `single` | `double`

FFT length, specified as a positive integer. The default, `[]`, means that the FFT length is equal to the number of rows in the input signal.

#### Dependencies

To enable this property, set `InputDomain` to `'Time'`.

Data Types: `single` | `double` | `int8` | `int16` | `int32` | `int64` | `uint8` | `uint16` | `uint32` | `uint64`

Specify how the log energy is shown in the coefficients vector output, specified as:

• `'Append'` –– The object prepends the log energy to the coefficients vector. The length of the coefficients vector is 1 + `NumCoeffs`.

• `'Replace'` –– The object replaces the first coefficient with the log energy of the signal. The length of the coefficients vector is `NumCoeffs`.

• `'Ignore'` –– The object does not calculate or return the log energy.

Data Types: `char` | `string`

Input sample rate in Hz, specified as a real positive scalar.

Tunable: Yes

Data Types: `single` | `double`

Band edges of the filter bank in Hz, specified as a nonnegative monotonically increasing row vector in the range [0, ∞). The maximum bandedge frequency can be any finite number. The number of bandedges must be in the range [4, 80].

The default band edges are spaced linearly for the first ten and then logarithmically after. The default band edges are set as recommended by [1].

#### Dependencies

To enable this property, set `FilterBank` to `Mel`.

Data Types: `single` | `double`

Frequency range of the filter bank in Hz, specified as a positive, monotonically increasing two-element row vector. The maximum frequency can be any finite number. The center frequencies of the filter bank are equally spaced between `hz2erb(FrequencyRange(1))` and `hz2erb(FrequencyRange(2))` on the ERB scale.

#### Dependencies

To enable this property, set `FilterBank` to `Gammatone`.

Data Types: `single` | `double`

Domain for filter bank design, specified as either `'Hz'` or `'Bin'`. The filter bank is designed as overlapped triangles with band edges specified by the `BandEdges` property.

The `BandEdges` property is specified in Hz. When you set the design domain to:

• `'Hz'` –– Filter bank triangles are drawn in Hz and are mapped onto bins.

Here is an example that plots the filter bank in bins when the `FilterBankDesignDomain` is set to `'Hz'`:

```[audioFile, fs] = audioread('NoisySpeech-16-22p5-mono-5secs.wav'); duration = round(0.02*fs); % 20 ms audio segment audioSegment = audioFile(5500:5500+duration-1); cepFeatures = cepstralFeatureExtractor('SampleRate',fs)```
```cepFeatures = cepstralFeatureExtractor with properties: Properties InputDomain: 'Time' NumCoeffs: 13 FFTLength: [] LogEnergy: 'Append' SampleRate: 22500 Advanced Properties BandEdges: [1×42 double] FilterBankDesignDomain: 'Hz' FilterBankNormalization: 'Bandwidth'```
Pass the audio segment as an input to the cepstral feature extractor algorithm to lock the object.
`[coeffs,delta,deltaDelta] = cepFeatures(audioSegment);`
Using the `getFilters` function, get the filter bank. Plot the filter bank.
```[filterbank, freq] = getFilters(cepFeatures); plot(freq(1:150),filterbank(1:150,:))```

For details, see [1].

• `'Bin'` –– The bandedge frequencies in `'Hz'` are converted to bins. The filter bank triangles are drawn symmetrically in bins.

Change the `FilterBankDesignDomain` property to `'Bin'`:

```release(cepFeatures); cepFeatures.FilterBankDesignDomain = 'Bin'; [coeffs,delta,deltaDelta] = cepFeatures(audioSegment); [filterbank, freq] = getFilters(cepFeatures); plot(freq(1:150),filterbank(1:150,:))```

For details, see [2].

#### Dependencies

To enable this property, set `FilterBank` to `Mel`.

Data Types: `char` | `string`

Normalization technique used on the weights of the filter bank, specified as:

• `'Bandwidth'` –– The weights of each bandpass filter are normalized by the corresponding bandwidth of the filter.

• `'Area'` –– The weights of each bandpass filter are normalized by the corresponding area of the bandpass filter.

• `'None'` –– The weights of the filter are not normalized.

Data Types: `char` | `string`

## Usage

### Syntax

``````[coeffs,delta,deltaDelta] = cepFeatures(audioIn)``````

### Description

example

``````[coeffs,delta,deltaDelta] = cepFeatures(audioIn)``` returns the cepstral coefficients, the log energy, the delta, and the delta-delta.The log energy value prepends the coefficient vector or replaces the first element of the coefficients vector based on whether you set the `LogEnergy` property to `'Append'` or `'Replace'`. For details, see coeffs.```

### Input Arguments

expand all

Audio input to the cepstral feature extractor, specified as a column vector or a matrix. If specified as a matrix, the columns are treated as independent audio channels.

Data Types: `single` | `double`

### Output Arguments

expand all

Cepstral coefficients, returned as a column vector or a matrix. If the coefficients matrix is an N-by-M matrix, N is determined by the values you specify in `NumCoeffs` and `LogEnergy` properties. M equals the number of input audio channels.

When the `LogEnergy` property is set to:

• `'Append'` –– The object prepends the log energy value to the coefficients vector. The length of the coefficients vector is 1 + `NumCoeffs`. This is the default setting of the `LogEnergy` property.

• `'Replace'` –– The object replaces the first coefficient with the log energy of the signal. The length of the coefficients vector is `NumCoeffs`.

• `'Ignore'` –– The object does not calculate or return the log energy.

Data Types: `single` | `double`

Change in coefficients over consecutive calls to the algorithm, returned as a vector or a matrix. The `delta` array is of the same size and data type as the `coeffs` array.

In this example, `cepFeatures` is the cepstral feature extractor that accepts audio input signal sampled at 12 kHz. Stream in three segments of audio signal on three consecutive calls to the object algorithm. Return the cepstral coefficients of the filter bank and the corresponding `delta` values.

```cepFeatures = cepstralFeatureExtractor('SampleRate',12000); [coeff1,delta1] = cepFeatures(audioIn); [coeff2,delta2] = cepFeatures(audioIn); [coeff3,delta3] = cepFeatures(audioIn); ```

`delta2` is computed as `coeff2-coeff1`, while `delta3` is computed as `coeff3-coeff2`. The initial array, `delta1`, is an array of zeros.

Data Types: `single` | `double`

Change in `delta` values over consecutive calls to the algorithm, returned as a vector or a matrix. The `deltaDelta` array is the same size and data type as the `coeffs` and `delta` arrays.

In this example, consecutive calls to the cepstral feature extractor algorithm return the `deltaDelta` values in addition to the coefficients and the `delta` values.

```cepFeatures = cepstralFeatureExtractor('SampleRate',12000); [coeff1,delta1,deltaDelta1] = cepFeatures(audioIn); [coeff2,delta2,deltaDelta2] = cepFeatures(audioIn); [coeff3,delta3,deltaDelta3] = cepFeatures(audioIn); ```

`deltaDelta2` is computed as `delta2-delta1`, while `deltaDelta3` is computed as `delta3-delta2`. The initial array, `deltaDelta1`, is an array of zeros.

Data Types: `single` | `double`

## Object Functions

To use an object function, specify the System object as the first input argument. For example, to release system resources of a System object named `obj`, use this syntax:

`release(obj)`

expand all

 `getFilters` Get auditory filter bank
 `clone` Create duplicate System object `isLocked` Determine if System object is in use `release` Release resources and allow changes to System object property values and input characteristics `reset` Reset internal states of System object `step` Run System object algorithm

## Examples

expand all

Extract the mel frequency cepstral coefficients and the log energy values of segments in a speech file. Return `delta`, the difference between current and the previous cepstral coefficients, and `deltaDelta`, the difference between the current and the previous `delta` values. The log energy value the object computes can prepend the coefficients vector or replace the first element of the coefficients vector. This is done based on whether you set the `LogEnergy` property of the `cepstralFeatureExtractor` object to `'Replace'` or `'Append'`.

Read an audio signal from `'SpeechDFT-16-8-mono-5secs.wav'` file. Extract a 40 ms segment from the audio data. Create a `cepstralFeatureExtractor` object. The cepstral coefficients computed by the default object are the mel frequency coefficients. In addition, the object computes the log energy, delta, and delta-delta values of the audio segment.

```[audioFile, fs] = audioread('SpeechDFT-16-8-mono-5secs.wav'); duration = round(0.04*fs); % 40 ms audioSegment = audioFile(5500:5500+duration-1); cepFeatures = cepstralFeatureExtractor('SampleRate',fs)```
```cepFeatures = cepstralFeatureExtractor with properties: Properties FilterBank: 'Mel' InputDomain: 'Time' NumCoeffs: 13 FFTLength: [] LogEnergy: 'Append' SampleRate: 8000 Show all properties ```

The `LogEnergy` property is set to `'Append'`. The first element in the coefficients vector is the log energy value and the remaining elements are the 13 cepstral coefficients computed by the object. The number of cepstral coefficients is determined by the value you specify in the `NumCoeffs` property.

`[coeffs,delta,deltaDelta] = cepFeatures(audioSegment)`
```coeffs = 14×1 3.8281 -19.4827 11.7649 -6.2989 5.8894 -0.3366 0.9583 0.8768 -2.0384 2.3678 ⋮ ```
```delta = 14×1 0 0 0 0 0 0 0 0 0 0 ⋮ ```
```deltaDelta = 14×1 0 0 0 0 0 0 0 0 0 0 ⋮ ```

The initial values for the `delta` and `deltaDelta` arrays are always zero. Consider another 40 ms audio segment in the file and extract the cepstral features from this segment.

```audioSegmentTwo = audioFile(5820:5820+duration-1); [coeffsTwo,deltaTwo,deltaDeltaTwo] = cepFeatures(audioSegmentTwo)```
```coeffsTwo = 14×1 3.0899 -20.4756 10.4455 -5.8759 7.2215 -1.2027 -0.0236 1.9183 -1.2127 2.0669 ⋮ ```
```deltaTwo = 14×1 -0.7382 -0.9928 -1.3194 0.4230 1.3321 -0.8661 -0.9819 1.0415 0.8257 -0.3009 ⋮ ```
```deltaDeltaTwo = 14×1 -0.7382 -0.9928 -1.3194 0.4230 1.3321 -0.8661 -0.9819 1.0415 0.8257 -0.3009 ⋮ ```

Verify that the difference between `coeffsTwo` and `coeffs` vectors equals `deltaTwo`.

`isequal(coeffsTwo-coeffs,deltaTwo)`
```ans = logical 1 ```

Verify that the difference between `deltaTwo` and `delta` vectors equals `deltaDeltaTwo`.

`isequal(deltaTwo-delta,deltaDeltaTwo)`
```ans = logical 1 ```

Many feature extraction techniques operate on the frequency domain. Converting an audio signal to the frequency domain only once is efficient. In this example, you convert a streaming audio signal to the frequency domain and feed that signal into a voice activity detector. If speech is present, mel-frequency cepstral coefficients (MFCC) features are extracted from the frequency-domain signal using the `cepstralFeatureExtractor System object™`.

Create a `dsp.AudioFileReader` System object to read from an audio file.

```fileReader = dsp.AudioFileReader('Counting-16-44p1-mono-15secs.wav'); fs = fileReader.SampleRate;```

Process the audio in 30 ms frames with a 10 ms hop. Create a default `dsp.AsyncBuffer` object to manage overlap between audio frames.

```samplesPerFrame = ceil(0.03*fs); samplesPerHop = ceil(0.01*fs); samplesPerOverlap = samplesPerFrame - samplesPerHop; fileReader.SamplesPerFrame = samplesPerHop; buffer = dsp.AsyncBuffer;```

Create a `voiceActivityDetector` System object and a `cepstralFeatureExtractor` System object. Specify that they operate in the frequency domain. Create a `dsp.SignalSink` to log the extracted cepstral features.

```VAD = voiceActivityDetector('InputDomain','Frequency'); cepFeatures = cepstralFeatureExtractor('InputDomain','Frequency','SampleRate',fs,'LogEnergy','Replace'); sink = dsp.SignalSink;```

In an audio stream loop:

1. Read one hop's of samples from the audio file and save the samples into the buffer.

2. Read a frame from the `buffer` with specified overlap from the previous frame.

3. Call the voice activity detector to get the probability of speech for the frame under analysis.

4. If the frame under analysis has a probability of speech greater than 0.75, extract cepstral features and log the features using the signal sink. If the frame under analysis has a probability of speech less than 0.75, write a vector of NaNs to the sink.

```threshold = 0.75; nanVector = nan(1,13); while ~isDone(fileReader) audioIn = fileReader(); write(buffer,audioIn); overlappedAudio = read(buffer,samplesPerFrame,samplesPerOverlap); X = fft(overlappedAudio,2048); probabilityOfSpeech = VAD(X); if probabilityOfSpeech > threshold xFeatures = cepFeatures(X); sink(xFeatures') else sink(nanVector) end end```

Visualize the cepstral coefficients over time.

```timeVector = linspace(0,15,size(sink.Buffer,1)); plot(timeVector,sink.Buffer) xlabel('Time (s)') ylabel('MFCC Amplitude') legend('Log-Energy','c1','c2','c3','c4','c5','c6','c7','c8','c9','c10','c11','c12')```

Create a `dsp.AudioFileReader` object to read in audio data frame-by-frame. Create an `audioDeviceWriter` to write the audio to your sound card. Create a `dsp.ArrayPlot` to visualize the GTCC over time.

```fileReader = dsp.AudioFileReader('RandomOscThree-24-96-stereo-13secs.aif'); deviceWriter = audioDeviceWriter(fileReader.SampleRate); scope = dsp.ArrayPlot; ```

Create a `cepstralFeatureExtractor` that extracts GTCC.

```cepFeatures = cepstralFeatureExtractor('FilterBank','Gammatone', ... 'SampleRate',fileReader.SampleRate); ```

In an audio stream loop:

1. Read in a frame of audio data.

2. Extract the GTCC from the frame of audio.

3. Visualize the GTCC.

4. Write the audio frame to your device.

```while ~isDone(fileReader) audioIn = fileReader(); coeffs = cepFeatures(audioIn); scope(coeffs) deviceWriter(audioIn); end release(cepFeatures) release(scope) release(fileReader) ```

expand all