Extract word matrix and context matrix from output of trainWordEmbedding / word2vec

8 Ansichten (letzte 30 Tage)
When I use trainWordEmbedding on a set of documents to train a word embedding that I can then use word2vec with, I get an object "emb" as output that I can input into word2vec. Using word2vec I then get, for each word, the vectors that I can then further process.
However, I would like to also receive as output the underlying word matrix and context matrix (as well as the value of the loss of the training). Does anyone know how I can access these data?
  1 Kommentar
Christopher Creutzig
Christopher Creutzig am 26 Nov. 2018
What exactly do you mean by “word matrix” and “context matrix”?
I guess the “context matrix” is what (some) other people call the cooccurrence matrix in the skip-gram model? We do not currently have a way to compute that.

Melden Sie sich an, um zu kommentieren.

Antworten (1)

Jayanti
Jayanti am 14 Feb. 2025
Hi Daniel,
By word matrix I assume you want the unique words in the document. When you use “trainWordEmbedding” to train a word embedding model on a set of documents, it returns an object called “emb”. This object includes a property named “Vocabulary”, which contains the unique words from the model, stored as a string vector. You can access these unique words using the following code:
emb = trainWordEmbedding(filename);
words = emb.Vocabulary;
By context matrix I assume you mean cooccurrence matrix. However, I couldn't find specific documentation on accessing a co-occurrence matrix directly through the “trainWordEmbedding” or “word2vec”.
Hope this will be helpful!

Kategorien

Mehr zu Text Analytics Toolbox finden Sie in Help Center und File Exchange

Produkte


Version

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by