Removing commas between columns in text data

Question

0 Stimmen

I have a txt file which is the ouput of a lemmatizer, in the form

Sometimes, ,, I, use, commas, .
I, like, writing, ,, I, like, reading

How can I read it into a tokenizedDocument deleting the unneccessary commas between tokens? A simple approach would be

test=readlines('/path/to/file.txt')
test=strrep(test,',','')
test=tokenizedDocument(test)

but it would remove even the commas already present in the original text, while I'd like to preserve punctuation-

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Walter Roberson am 16 Okt. 2021

In MATLAB Online öffnen

2 Stimmen

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, {'(?<=[^,]),\s', '\s*,,', '\s+\.'}, {' ', ',', '.'})
test = 2×1 cell array
    {'Sometimes, I use commas.'      }
    {'I like writing, I like reading'}

Notice we had to have a special rule for periods. You have 'use, commas' which should almost certainly translate to 'use commas' (so comma space becomes space), but after that 'commas, .' should not become 'commas .' .

To put it another way, we cannot use the rule that comma space pair is to be deleted: that works for the comma space between the word 'commas' and the period, but it does not work for the comma space pair between 'use' and 'commas': if you tried to apply that rule then 'use, commas' would merge together to 'usecommas' .

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Kim Maria Damiani am 16 Okt. 2021

Thank you!

Melden Sie sich an, um zu kommentieren.

Answer 2

Chunru am 16 Okt. 2021

In MATLAB Online öffnen

0 Stimmen

test = {'Sometimes, ,, I, use, commas, .'
    'I, like, writing, ,, I, like, reading'};
test = regexprep(test, ',\s', ' ')
test = 2×1 cell array
    {'Sometimes , I use commas .'     }
    {'I like writing , I like reading'}

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Removing commas between columns in text data

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Produkte

Version

Tags

Community Treasure Hunt

Removing commas between columns in text data

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Kategorien

Produkte

Version

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden