PROBLEM WHIT FIT LDA TO WORD COUNT MATRIX

1 Ansicht (letzte 30 Tage)
Floor 97
Floor 97 am 16 Sep. 2021
Beantwortet: Aneela am 21 Feb. 2024
Hi,
I have written this code:
data = readtable('project','TextType','string','PreserveVariableNames',true);
textData = data.speech;
labels = data.the_news_desk;
documents = tokenizedDocument(textData);
documents = erasePunctuation(documents);
documents = removeStopWords(documents);
numTopics = 3;
mdl = fitlda(documents,numTopics,'Verbose',0);
But Matlab say that the first parameter isn't a correct variable for the methods. This is the error message:
Expected input to be one of these types:
bagOfWords, bagOfNgrams, double, single, uint8, uint16, uint32, uint64, int8, int16, int32, int64
Instead its type was tokenizedDocument.
How can I modify it?
Thanks,
Sara

Antworten (1)

Aneela
Aneela am 21 Feb. 2024
Hi Sara,
The error you are encountering states that the datatype of first argument must be any one of “bagOfWords, bagOfNgrams, double, single, uint8, uint16, uint32, uint64, int8, int16, int32, int64”, but currently its type is tokenizedDocument.
To resolve the error, you can try changing the datatype of the training data before fitting it into the model. I have added a code snippet for implementing this below.
bagW = bagOfWords(documents);
numTopics = 3;
mdl = fitlda(bagW, numTopics, 'Verbose',0);
“bagOfWords” is recommended to be used in this case because Latent Dirichlet Allocation (LDA) is a generative statistical model that is commonly used for discovering abstract topics within a collection of documents, where a topic is characterized by a distribution over words.
  • In the context of topic modelling with “Latent Dirichlet Allocation” (LDA), the “bagOfWords” model captures the word frequencies within documents.
  • “bagOfNgrams” is useful if the relationship between consecutive words is important for your analysis.
  • Numeric types are typically used when you have pre-computed numerical features that you want to use for topic modelling.
For more details on “LDA model”, refer to the following link.https://www.mathworks.com/help/textanalytics/ref/ldamodel.html?s_tid=doc_ta

Kategorien

Mehr zu Modeling and Prediction finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by