I want to efficiently count the number of occurences of numbers between 1-numel(num) in a Matrix. I came up with two options for that:
sz = [3000 2000];
mx = prod(sz);
num = randi([1 mx],sz);
tic;
% First option
counts = zeros(numel(num),1);
for i = 1:numel(num)
counts(num(i)) = counts(num(i)) + 1;
end
toc
tic;
% Second option
uni = unique(num);
uni = reshape(uni,[],1);
hc = histcounts(num,[uni;uni(end)]);
toc
Execution times are:
Elapsed time is 0.098540 seconds.
Elapsed time is 0.342214 seconds.
So option 1 is clearly faster. However the for loop bugs me. Is there any possibility to vectorize this?

3 Kommentare

Jan Siegmund
Jan Siegmund am 17 Okt. 2020
Bearbeitet: Jan Siegmund am 17 Okt. 2020
Note: The reshape is because unique always returns a column vector, but if it receives a row vector as input, is produces a row vector and the concat fails... gotta love that inconsistency
Adam Danz
Adam Danz am 18 Okt. 2020
"I want to efficiently count the number of occurences of numbers between 1-numel(num) in a Matrix"
I'm a bit lost. Your matrix contains integers between 1 and 1000 and has 6000000 values (3000x2000). So, why are you looking for 6000000 different values when you only have a max of 1000 values?
Sorry, num was a stupid example. A more suitable would be
sz = [3000 2000];
mx = prod(sz);
num = randi([1 mx],sz);
%...

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Matt J
Matt J am 18 Okt. 2020
Bearbeitet: Matt J am 18 Okt. 2020

1 Stimme

In this situation, accumarray will be faster than histcounts, but still not as fast as the for-loop,
tic;
hc=accumarray(num(:),1,[mx,1]).';
toc
Elapsed time is 0.172890 seconds. %for-loop
Elapsed time is 0.236013 seconds. %accumarray
unless the values are pre-sorted,
num=sort(num(:));
tic;
hc=accumarray(num(:),1,[mx,1]).';
toc
Elapsed time is 0.168976 seconds. %for-loop
Elapsed time is 0.075965 seconds. %accumarray
I think this is simply one of those situations where Matlab's for-loop optimization has caught up to vectorized code.

1 Kommentar

Jan Siegmund
Jan Siegmund am 18 Okt. 2020
Alright, accumarray is also a great choice. Thank you for your time and effort. This is the answer I am going to accept.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (2)

Matt J
Matt J am 18 Okt. 2020
Bearbeitet: Matt J am 18 Okt. 2020

0 Stimmen

I want to efficiently count the number of occurences of numbers between 1-numel(num) in a Matrix
If that's really what you want, then
hc = histcounts(num(:), 1:numel(num)+1 );
but as Adam points out, it would make more sense to have
hc = histcounts(num(:), 1:1001 );

3 Kommentare

Jan Siegmund
Jan Siegmund am 18 Okt. 2020
Bearbeitet: Jan Siegmund am 18 Okt. 2020
Yea, I wanted 1-numel(num). Sorry for the bad example.
Your proposal is great!
So I updated my example.
sz = [3000 2000];
mx = prod(sz);
num = randi([1 mx],sz);
tic;
counts = zeros(numel(num),1);
for i = 1:numel(num)
counts(num(i)) = counts(num(i)) + 1;
end
toc
tic;
hc = histcounts(num(:),1:mx+1);
toc
But histcounts is still slower than the subscripted increment:
Elapsed time is 0.296456 seconds.
Elapsed time is 0.984777 seconds.
I think this is due to the tons of branching done in histcounts.m . Is there any more efficient way?
Ok no, the branching is not the problem. Calling
matlab.internal.math.histcounts
directly only results in a minor improvement.
Matt J
Matt J am 18 Okt. 2020
Bearbeitet: Matt J am 18 Okt. 2020
I think the for-loop is the fastest for integer data ranges that large. Unfortunately (and strangely), histcounts cannot innately recognize that the data consists only of integers and use a simpler binning method for that case. There is an input option 'BinMethod'='integers' that is offered, however, it will not permit more than 65536 bins.

Melden Sie sich an, um zu kommentieren.

Kategorien

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by