How was the exampleWordEmbedding example in the text analytics toolbox trained, in detail?
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
The documentation for readWordEmbedding gives a pre-trained embedding, saying only that it was "derived by analyzing text from Wikipedia".
How was it trained?
Should we consider it a 'high quality' word embedding, better than anything a user could generate without extensive work and CPU time? Or is it a quick and dirty starting point, and we are encouraged to train our own for better performance?
0 Kommentare
Antworten (1)
Christopher Creutzig
am 9 Mär. 2020
The embedding is rather low-dimensional (50 dimensions) and has a small vocabulary (with 9999 words). It is unlikely to be “high quality” unless your analysis just happens to need precisely this dataset.
For production use, it is much more likely you'll find fastTextWordEmbedding useful, which downloads data from https://www.mathworks.com/matlabcentral/fileexchange/66229-text-analytics-toolbox-model-for-fasttext-english-16-billion-token-word-embedding for you.
0 Kommentare
Siehe auch
Kategorien
Mehr zu Characters and Strings finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!