MATLAB Answers

fastText word embedding support package

14 views (last 30 days)
Peter Mayhew
Peter Mayhew on 24 Nov 2018
Answered: Peter Mayhew on 28 Nov 2018
I'm using the Text Analytics Toolbox and the Pretrained fastText word embedding support package. Is it possible for me to add addional words to the pretrained vocabulary?

  0 Comments

Sign in to comment.

Accepted Answer

Peter Mayhew
Peter Mayhew on 28 Nov 2018
To answer my own question, the following example code shows how to add words to the embedding vocabulary. This requires a new embedding object to be created.
>> emb = fastTextWordEmbedding;
>> vocab = emb.Vocabulary;
>> mat = word2vec(emb, vocab);
>> newvocab = [vocab "New Word 1" "New Word 2"];
>> newmat = [mat; randn(2,300)];
>> newemb = wordEmbedding(newvocab, newmat);
In addition, I have confirmed it is possible to use the fastText pretrained 2 Million words (600 billion tokens) rather than the default 1 Million words (16 billion token) which is provided with the MATLAB fastTextWordEmbedding function.
To do this, replace the "wiki-news-300d-1M.vec.zip" file with the alternative pre-trained word vectors file from https://fasttext.cc/docs/en/english-vectors.html

  0 Comments

Sign in to comment.

More Answers (0)


Translated by