Filter löschen
Filter löschen

Matrix index is out of range for deletion

2 Ansichten (letzte 30 Tage)
oliver
oliver am 10 Apr. 2023
Kommentiert: Walter Roberson am 10 Apr. 2023
my project is sentiment analysis I am trying to follow the tutorial "Create Simple Text Model for Classification"
my database is a list of reviews with labelled sentiment (either 'positive' or 'negative)
I am trying to remove any documents containing no words from the bag-of-words model, and remove the corresponding entries in labels
my code is:
filename = "IMBD_reviews_smol.csv";
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
Ytrain(idx) = []; %produces an error
Deletion requires an existing variable.
Xtrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end
  7 Kommentare
oliver
oliver am 10 Apr. 2023
with the code i recieve the error message "Error using classreg.learning.classif.FullClassificationModel.prepareData
No class names are found in input labels." about line 25 "mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");"
filename = "IMBD_reviews_smol.csv";
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
YTrain = dataTrain.sentiment;
YTest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
YTrain = [];
XTrain = bag.Counts;
mdl = fitcecoc(Xtrain,YTrain,"Learners","linear");
documentsTest = preprocessText(textDataTest);
XTest = encode(bag,documentsTest);
YPred = predict(mdl,XTest);
acc = sum(YPred == YTest)/numel(YTest);
str = [
"i hated this movie."
"this was really good"
"sometimes slow movies work out in the way you want and thats how this movie went"];
documentsNew = preprocessText(str);
XNew = encode(bag,documentsNew);
labelsNew = predict(mdl,XNew);
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end
Walter Roberson
Walter Roberson am 10 Apr. 2023
Yes, as I indicated, you are removing all documents from the bag, so your training information becomes empty.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Walter Roberson
Walter Roberson am 10 Apr. 2023
Verschoben: Walter Roberson am 10 Apr. 2023
filename = "IMBD_reviews_smol.csv";
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
Ytrain = dataTrain.sentiment;
Ytest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
whos Ytrain idx
Name Size Bytes Class Attributes Ytrain 181x1 423 categorical idx 1x181 1448 double
Ytrain(idx) = []; %produces an error
Xtrain = bag.Counts;
whos
Name Size Bytes Class Attributes Xtrain 0x0 24 double sparse Ytest 20x1 262 categorical Ytrain 0x1 242 categorical ans 1x46 92 char bag 1x1 640 bagOfWords cmdout 1x33 66 char cvp 1x1 3278 cvpartition data 201x2 543470 table dataTest 20x2 66077 table dataTrain 181x2 478944 table documents 181x1 43321 tokenizedDocument filename 1x1 178 string idx 1x181 1448 double textDataTest 20x1 64602 string textDataTrain 181x1 477308 string
mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");
Error using classreg.learning.classif.FullClassificationModel.prepareData
No class names are found in input labels.

Error in ClassificationECOC.prepareData (line 128)
classreg.learning.classif.FullClassificationModel.prepareData(X,Y,varargin{:});

Error in classreg.learning.FitTemplate/fit (line 246)
this.PrepareData(X,Y,this.BaseFitObjectArgs{:});

Error in ClassificationECOC.fit (line 119)
this = fit(temp,X,Y);

Error in fitcecoc (line 357)
obj = ClassificationECOC.fit(X,Y,varargin{:});
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeShortWords(documents,15);
end
You are removing all of the documents. The bag is left empty.
  2 Kommentare
oliver
oliver am 10 Apr. 2023
Bearbeitet: Walter Roberson am 10 Apr. 2023
I am trying to follow this matlab link https://uk.mathworks.com/help/textanalytics/ug/create-simple-text-model-for-classification.html but using my own dataset. can you help with what i need to change?
Walter Roberson
Walter Roberson am 10 Apr. 2023
You were calling removeShortWords twice, so all words less than 15 characters were being removed. The remaining "words" all happened to be unique, so removing infrequent words resulted in an empty bag.
filename = "IMBD_reviews_smol.csv";
data = readtable(filename,'TextType','string');
data.sentiment = categorical(data.sentiment);
cvp = cvpartition(data.sentiment,'Holdout',0.1);
dataTrain = data(cvp.training,:);
dataTest = data(cvp.test,:);
textDataTrain = dataTrain.review;
textDataTest = dataTest.review;
Ytrain = dataTrain.sentiment;
Ytest = dataTest.sentiment;
documents = preprocessText(textDataTrain);
bag = bagOfWords(documents);
bag = removeInfrequentWords(bag,2);
[bag,idx] = removeEmptyDocuments(bag);
Ytrain(idx) = [];
Xtrain = bag.Counts;
mdl = fitcecoc(Xtrain, Ytrain, "Learners", "linear");
mdl
mdl =
CompactClassificationECOC ResponseName: 'Y' ClassNames: [negative positive] ScoreTransform: 'none' BinaryLearners: {[1×1 ClassificationLinear]} CodingMatrix: [2×1 double] Properties, Methods
function documents = preprocessText(textData)
documents = tokenizedDocument(textData);
documents = addPartOfSpeechDetails(documents);
documents = removeStopWords(documents);
documents = erasePunctuation(documents);
documents = removeShortWords(documents,2);
documents = removeLongWords(documents,15);
end

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Produkte


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by