detokenizedDocument: How to turn tokenized text back into human-readable, non-tokenized text?

5 Ansichten (letzte 30 Tage)
After tokenizing and manipulating a document, how do you put the results back together into non-tokenized, human-readable form?
The tokenization process adds spaces and breaks up text elements and there is not a straightforward way (that I've found so far) to get back to useable text. Is there a function/method for doing this? Here is an example:
textData = "Jim and Suzie wanted Jimmy’s to have as few 'complex' ingredients as possible (less than the seventeen they had seen some brands use).";
d = tokenizedDocument(textData);
join(string(d))
ans =
"Jim and Suzie wanted Jimmy’s to have as few ' complex ' ingredients as possible ( less than the seventeen they had seen some brands use ) ."
Note that spaces have been added between tokens.
Have tried MANY different matlab functions to try to get useable/readable text back, without success.
Would be great to have the answer, which I hope is simple, and to have this added to the the help for tokenizedDocument.
Also, would be great to have examples like for correct spelling:
yield a complete, corrected document, rather than a tokenized document, as it isn't obvious how to get the usable, original form document back from the tokenized form.
Thank you
  1 Kommentar
CdC
CdC am 26 Aug. 2022
Bearbeitet: CdC am 26 Aug. 2022
??? Can someone at Mathworks please reply???
This is a fundamental requirement to be able to use the Text Analytics Toolbox. For example, being able to correct spelling (or use many of the other analytics functions) on text, but then not being able to put the result back into a usable text form does not accomplish anything useful.
I'm an experienced Matlab user, and I've spent hours and hours trying to figure out a way to do this, thus far with no success. This really needs to be resolved. Thank you for your help.

Melden Sie sich an, um zu kommentieren.

Antworten (0)

Kategorien

Mehr zu Entering Commands finden Sie in Help Center und File Exchange

Tags

Produkte


Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by