Increasing vocabulary of pre-trained word embeddings
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
MathWorks Support Team
am 3 Mai 2019
Bearbeitet: MathWorks Support Team
am 27 Sep. 2021
Can we extend the pre-trained word embeddings and increase the vocabulary?
Akzeptierte Antwort
MathWorks Support Team
am 2 Sep. 2021
Bearbeitet: MathWorks Support Team
am 27 Sep. 2021
Yes. In order to add more words to the existing vocabulary given by 'fastTextWordEmbedding', you can try the following:
1. Obtain the wordEmbedding object for 'fastTextWordEmbedding'-
>> emb = fastTextWordEmbedding;
2. Obtain the vocabulary from the wordEmbedding object:
>> vocab = emb.Vocabulary;
3. Add more words to the string array, for example:
>> vocab(end+1) = 'Hi';
>> vocab(end+1) = 'Hello';
4. Write to a text file with UTF-8 encoding in either the word2vec or GloVe text embedding format, or a zip file containing a text file of this format. You can use fopen, fprintf and fclose for this step:
5. Use 'readWordEmbedding' to read this text file with additional words, to get a new word embedding object. The doc page for 'readWordEmbedding' would explain more about why the file needs to be in the above format.
0 Kommentare
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Migrate GUIDE Apps finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!