how to extract a list of unique words from a set of one row strings

Question

Harrison am 14 Nov. 2024

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings

Kommentiert: Harrison am 15 Nov. 2024

Basically I have a set of 11 strings of words, and each string has no repeating words, but I need a list of every unique word in all 11 strings.

I've found that this works for one string at a time, but I can't get a list for all 11 strings this way.

A{1} = updatedDocuments(1,1)

B{1} = strjoin(unique(strtrim(strsplit(A{1}, ',')))', '')

Is it possible to index A{1} as updatedDocuments(1:11,1) or do something similar?

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Madheswaran am 14 Nov. 2024

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings#answer_1545194

Bearbeitet: Madheswaran am 15 Nov. 2024

In MATLAB Online öffnen

Hi @Harrison,

I am assuming the following:

'updatedDocuments' is an array of 'tokenizedDocument'
Each document contains text that is comma seperated and doesn't end with a comma

To get the unique words from the entire set of strings, you can follow the below approach:

% remove comma from the documents if you don't want comma to be 
% included in 'uniqeWords'
updatedDocuments = removeWords(updatedDocuments, ","); 
uniqueWords = updatedDocuments.Vocabulary;

If the 'updatedDocuments' is an cell array of char vector, you can follow the below approach:

updatedDocuments = strcat(updatedDocuments, ','); % Add comma at end of each cell
allWords = strjoin(updatedDocuments(1:11,1), ' '); % Join all words into a single string
allWords = strtrim(strsplit(allWords, ',')); % Split with comma as delimiter and trim
uniqueWords = unique(allWords); % unique words (1 x n cell where n is the number of unique words)

For more information, refer to the following documentations:

Hope this helps!

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Madheswaran am 15 Nov. 2024

That is because I assumed 'updatedDocument' to be a cell array of character vectors. If 'updatedDocument' were an array of 'tokenizedDocument', resolving this issue would be straightforward. I have updated the answer by including a solution for when 'updatedDocument' is a 'tokenizedDocument', in addition to the existing explanation.

Let me know if that helps!

Harrison am 15 Nov. 2024

Thats exactly right! Thank you!!

Melden Sie sich an, um zu kommentieren.

Answer 2

Paul am 14 Nov. 2024

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2166149-how-to-extract-a-list-of-unique-words-from-a-set-of-one-row-strings#answer_1544974

In MATLAB Online öffnen

If UpdatedDocuments is a 1D cell array of chars ...

UpdatedDocuments{1} = 'one,two,three,one';
UpdatedDocuments{2} = 'one,two,three,two';
UpdatedDocuments{3} = 'one,two,three,three';
result = cellfun(@(S) strjoin(unique(strtrim(strsplit(S, ','))),','),UpdatedDocuments,'Uni',false)
result = 1x3 cell array
    {'one,three,two'}    {'one,three,two'}    {'one,three,two'}

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Paul am 15 Nov. 2024

In MATLAB Online öffnen

The Vocabulary property of tokenizedDocument returns the uniqew words in the array

documents = tokenizedDocument([
    "an example of a short sentence  an example of a short sentence " 
    "a second short sentence a second short sentence"]);
documents
documents = 
  2x1 tokenizedDocument:

    12 tokens: an example of a short sentence an example of a short sentence
     8 tokens: a second short sentence a second short sentence
documents.Vocabulary
ans = 1x7 string array
    "an"    "example"    "of"    "a"    "short"    "sentence"    "second"

Melden Sie sich an, um zu kommentieren.

how to extract a list of unique words from a set of one row strings

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (1)

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

how to extract a list of unique words from a set of one row strings

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (1)

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden