Text Analytics Toolbox seems making lots of mistakes on recognizing language and PartOfSpeech

2 Ansichten (letzte 30 Tage)
Hi,
My input is a list of VERY BASIC ENGLISH words shown below. I would like to find out the part of speech of them.
kid
killer
kind
king
kiss
kitchen
knee
knife
knowledge
words = {'kid','killer','kind','king','kiss','kitchen','knee','knife','knowledge'};
words = string(words);
documents = tokenizedDocument(words);
documents = addPartOfSpeechDetails(documents);
tdetails = tokenDetails(documents);
And this is where the mistakes are when I check the 'tdetails' (see below).
Why Matlab thinks these words are german (should be 'en' for 'english') and adjectives (most of them should be nouns)?
tdetails =
9×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
___________ ______________ ______________ __________ _______ ________ ____________
"kid" 1 1 1 letters de adjective
"killer" 2 1 1 letters de adjective
"kind" 3 1 1 letters de adjective
"king" 4 1 1 letters de adjective
"kiss" 5 1 1 letters de adjective
"kitchen" 6 1 1 letters de adjective
"knee" 7 1 1 letters de adjective
"knife" 8 1 1 letters de adjective
"knowledge" 9 1 1 letters de adjective

Antworten (1)

Christopher Creutzig
Christopher Creutzig am 9 Mär. 2020
Language detection also works very much better on longer text. It is not trying to do a dictionary lookup (and several of your words are valid German, anyway), it uses statistical information of letter distribution.
Part of speech detection relies heavily on the context in a sentence.
documents = tokenizedDocument("My kid is a king");
documents = addPartOfSpeechDetails(documents);
tokenDetails(documents)
ans =
5×7 table
Token DocumentNumber SentenceNumber LineNumber Type Language PartOfSpeech
______ ______________ ______________ __________ _______ ________ ______________
"My" 1 1 1 letters en pronoun
"kid" 1 1 1 letters en noun
"is" 1 1 1 letters en auxiliary-verb
"a" 1 1 1 letters en determiner
"king" 1 1 1 letters en noun

Kategorien

Mehr zu Text Data Preparation finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by