code with the same function
Info
Diese Frage ist geschlossen. Öffnen Sie sie erneut, um sie zu bearbeiten oder zu beantworten.
Ältere Kommentare anzeigen
The code below gives me the right amount of how many times a letter repeats itself in a large text.txt.
I wanted another simple code, but that would do the same thing as this, in case it gave me the number of letters in a text (A = number of letters a, B = number of letters b and so on.)
if there is no simpler than this, accept another more complicated or the same level of difficulty.
fileread('mytextfile.txt')
data = fileread('mytextfile.txt');
nnz(data=='A')
nnz(ismember(data,'A'))
Antworten (2)
Walter Roberson
am 3 Apr. 2019
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A, accumarray(AA, 1)].')
8 Kommentare
Walter Roberson
am 3 Apr. 2019
Note that I had already answered you on this matter at https://www.mathworks.com/matlabcentral/answers/453555-help-me-please-please?s_tid=prof_contriblnk#answer_368356
Gabriel Cunha
am 3 Apr. 2019
Bearbeitet: per isakson
am 4 Apr. 2019
Rik
am 3 Apr. 2019
It is a bit easier to resolve the error in his previous answer:
%random test data instead of fileread:
%data=char(randi([64 65+25],1,40));data(data==64)=' ';
data = fileread('mytextfile.txt');
[a, ~, aa] = find(accumarray(reshape(double(data),[],1), 1));
fprintf('%c = %d\n', [a(:).'; aa(:).']);
Walter Roberson
am 3 Apr. 2019
Bearbeitet: Walter Roberson
am 4 Apr. 2019
fprintf('%c = %d\n', [0+A(:), accumarray(AA, 1)].')
Rik
am 4 Apr. 2019
Curiously, this doesn't seem to work for documents as large as a Bible translation (which seems to be the goal). I have attached a public domain translation for testing. Notice the difference between the two methods for lower case common letters. The accumarray seems to cap out at 65535.
data=fileread('WEB.txt');
clc
[A, ~, AA] = unique(data);
fprintf('%c = %d\n', [A(:), accumarray(AA, 1)].')
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Walter Roberson
am 4 Apr. 2019
double(char_list).'
Otherwise the char data type has priority over numeric in determining the data type of the concatenation.
Rik
am 4 Apr. 2019
Despite of its name, char_list is already a double. I didn't notice your last edit with 0+A(:), so that is why that method is capped (as chars are capped to 16 bit).
Walter Roberson
am 4 Apr. 2019
I did the 0+ after you (correctly) mentioned about the 65535.
There are two easy options: a loop and a histogram:
%for loop method:
data = fileread('mytextfile.txt');
letters='ABCDEFGHIJKLMNOPQRSTUVWXYZ';
counts=zeros(1,numel(letters));
for n=1:numel(letters)
counts(n)=nnz(data==letters(n));
end
%histogram method:
data = fileread('mytextfile.txt');
counts=histc(data,65:(65+25));
4 Kommentare
Gabriel Cunha
am 4 Apr. 2019
Rik
am 4 Apr. 2019
Those are the ASCII value of A and the number letters in the alphabet (minus 1). But you should probably be using something like this:
char_list=min(data):max(data);
counts=histc(data,char_list);
char_list(counts==0)=[];
counts(counts==0)=[];
fprintf('%c = %d\n', [char_list',counts'].')
Gabriel Cunha
am 4 Apr. 2019
Rik
am 4 Apr. 2019
The edited for-loop method should be a bit easier to understand.
Diese Frage ist geschlossen.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!