What formula is used to calculate perplexity in fitlda?

2 Ansichten (letzte 30 Tage)
Stephen Bruestle
Stephen Bruestle am 22 Jan. 2019
Beantwortet: Ilya am 13 Mär. 2019
Many sources have different formulas. I want to make sure that I am referencing the correct formula.

Akzeptierte Antwort

Ilya
Ilya am 13 Mär. 2019
If you are asking about the 2nd output from the logp method, document log-probabilities are estimated using the Mean-Field Approximation described in the paper cited at the bottom of that doc page. Perplexity is then
exp(-sum(logprob)/Nwords)
where Nwords is the total word count across all documents.
If you are asking about perplexity displayed during training when you pass 'Verbose' to fitlda, those document log-probabilities are computed using current estimates of topic probabilities per document. The perplexity formula is the same as above. Because document log-probabilities are evaluated at the max likelihood estimates of topic probabilities per document, these document probabilities are overestimated and perplexity is therefore underestimated. This is done for speed. The MFA approach gives a more accurate estimate by integtrating over topic probabilities at the cost of longer runtime.

Weitere Antworten (0)

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by