Traversing Text Document Matlab

3 Ansichten (letzte 30 Tage)
xRobot
xRobot am 17 Nov. 2019
Bearbeitet: Adam Danz am 19 Nov. 2019
Please provide guidance on this particular inquiry. All responses are highly valued and will be used to further knowledge(not just looking for a copy and paste solution). I am attempting to read a Microsoft Word dictionary into Matlab. From here I would like to be able to traverse it and extract words of a specific length, say four letter words, and put them into an array. Then I would like to select random words from the array and put them into a matrix. ?

Antworten (1)

Adam Danz
Adam Danz am 17 Nov. 2019
Bearbeitet: Adam Danz am 17 Nov. 2019
Reading from word doc
Here's the general approach to reading a Microsoft word document.
directory = 'C:\Users\AOC\Documents\MATLAB';
file = 'myDocFile.docx';
% Full path to the MS Word file
filePath = fullfile(directory,file);
% Read MS Word file using actxserver function
word = actxserver('Word.Application');
wdoc = word.Documents.Open(filePath);
txt = wdoc.Content.Text;
Quit(word)
delete(word)
The variable txt is a char array containing the text in your document.
Extracting 4-letter words
There are several approaches you could use. This one is fast and doesn't require segementing each word and counting each word-length. Instead, it uses a regular expression to search for this pattern:
[non-letter],[4-letters],[non-letter]
It also uses strtrim() to remove the leading and trailing white space.
% Extract 4-letter words.
s = strtrim(regexp(txt, '([^a-zA-Z])[a-zA-Z]{4}([^a-zA-Z])', 'match'));
s is a 1xn cell array of 4-letter words at character arrays.
Randomly select words
You can't put non-numeric values into a matrix but you can put them into a cell array. This example below chooses n random values from the extracted words.
n = 10;
if n > numel(s)
error('There are only %d words available. You selected %d words.' numel(s), n)
end
randIdx = randi(numel(s),1,n);
randWords = s(randIDx); % Here is your random selection
  5 Kommentare
xRobot
xRobot am 19 Nov. 2019
fileID = fopen('mylist.odt','r');
formatSpec = '%s';
words = fscanf(fileID,formatSpec);
I have used the above code to read in the file. It read in as a 1x11102 char. What I would like to do is convert this to a string array.
Adam Danz
Adam Danz am 19 Nov. 2019
Bearbeitet: Adam Danz am 19 Nov. 2019

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by