Main Content


Pretrained speaker recognition system

Since R2021b



sr = speakerRecognition returns a pretrained speaker recognition system, 'ivec-english-16kHz'. The 'ivec-english-16kHz' system is an instance of an object of type ivectorSystem trained on the LibriSpeech data set.


collapse all

This example uses a pretrained speaker recognition system, 'ivec-english-16kHz'. The 'ivec-english-16kHz' system is an instance of ivectorSystem trained on the LibriSpeech data set.

Download the pretrained speaker recognition system into your temporary directory, whose location is specified by the MATLAB® tempdir command. If you want to place the data files in a folder different from tempdir, change the directory name. Add the temporary directory to the search path. Create an i-vector system.

fname = "";
URL = "" + fname;



sr = speakerRecognition;

Read two speech signals, each of which contains the phrase "volume up" spoken out loud several times with different intonations. In one of the signals, the speaker is male. In the other signal, the speaker is female.

Read each signal and split it into two parts. One of the parts is used to enroll the speaker. The other part is used for speaker verification and identification.

[bf,fs] = audioread("MaleVolumeUp-16-mono-6secs.ogg");
enrollBF = bf(1:3*fs);
testBF = bf(3*fs+1:end);
bfLabel = "BF";

[rd,fs] = audioread("FemaleVolumeUp-16-mono-11secs.ogg");
enrollRD = rd(1:5*fs);
testRD = rd(5*fs+1:end);
rdLabel = "RD";

Enroll the speakers into the speaker recognition system. This creates a template of the speaker that can be used for verification or identification.

Extracting i-vectors ...done.
Enrolling i-vectors .....done.
Enrollment complete.

Call the identify function on the test data.

candidates = identify(sr,testBF)
candidates=2×2 table
    Label      Score  
    _____    _________

     BF        0.99474
     RD      0.0017846

candidates = identify(sr,testRD)
candidates=2×2 table
    Label      Score   
    _____    __________

     RD         0.24113
     BF      3.2741e-05

Call the verify function with the test data to confirm that the system correctly accepts or rejects speakers.

isVerified = verify(sr,testBF,bfLabel)
isVerified = logical

isVerified = verify(sr,testBF,rdLabel)
isVerified = logical

isVerified = verify(sr,testRD,rdLabel)
isVerified = logical

isVerified = verify(sr,testRD,bfLabel)
isVerified = logical

Call the info function to get information about how the model was trained.

  - This system was trained using the LibriSpeech train and development sets.
  LibriSpeech is an approximately 1000-hour corpus of read English speech sampled at 16 kHz.
  - The detection error tradeoff was determined by enrolling one file from each speaker in the 
  LibriSpeech test set, and then evaluating exhaustive pairs of the enrolled and remaining data.
  - The system was calibrated using the train-clean-100 and dev-clean data of LibriSpeech.

i-vector system input
  Input feature vector length: 60
  Input data type: double

  Train signals: 286808
  UBMNumComponents: 2048
  UBMNumIterations: 10
  TVSRank: 512
  TVSNumIterations: 5

  Train signals: 286807
  Train labels: 1 (91), 100043 (31) ... and 5652 more
  NumEigenvectors: 200
  PLDANumDimensions: 200
  PLDANumIterations: 5

  Calibration signals: 31242
  Calibration labels: 103 (102), 1034 (96) ... and 289 more

  Evaluation signals: 5382
  Evaluation labels: 102255 (46), 1066 (24) ... and 175 more

Remove the temporary directory from the search path.


Output Arguments

collapse all

Pretrained speaker recognition system, returned as an object of type ivectorSystem.


[1] Panayotov, Vassil, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. “Librispeech: An ASR Corpus Based on Public Domain Audio Books.” In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5206–10. South Brisbane, Queensland, Australia: IEEE, 2015.

Version History

Introduced in R2021b