Histogram with overlapping bins

Is there a fast way to code this ?
Say, X= [101 202 303 505] is the set of values to be binned,
and Y=
[0 100 200 300 400; 200 300 400 500 600] has information about the bin-edges, with the first row containing lower-bin edges and the second row containing upper bin-edges (so that successive bins are 0-200, 100-300, 200-400, 300-500, and 400-600)
and the result is [1,2,2,1,2].
Normally I would code this as:
out=NaN(1,size(Y,1)); for i=1:length(out) out(i) = length(find( X<=Y(2,i)&X>Y(1,i) ); end
Is there a faster/more succinct way, using a vectorized function ?
Thanks, Suresh

 Akzeptierte Antwort

Jan
Jan am 24 Feb. 2011

1 Stimme

At first I'd use SUM:
out = NaN(1,size(Y,2)); % Edited: 1->2
for i=1:length(out)
out(i) = sum(X<=Y(2,i) & X>Y(1,i));
end
But for large array HISTC is much faster:
X = rand(1, 10000)*1000;
Y = 0:100:1000;
N = histc(X, Y);
N_200blocks = N + [N(2:end), 0];
EDITED: (Walter discovered my misunderstanding about the last bin) Read the help text of HISTC for the last element of N_200blocks. I assume you can omit it in the output.

5 Kommentare

Walter Roberson
Walter Roberson am 24 Feb. 2011
Jan, I believe that should be "equal to" not "greater than".
Jan
Jan am 24 Feb. 2011
@Walter: Correct. Thanks for checking. The help text of HISTC confused me.
s k
s k am 24 Feb. 2011
Thanks for the sum hint, I guess it is faster than length(find()). And I know about histc, but it does not work for overlapping bins, does it ?
In case this threw you off, there was a mistake in the first line: I meant out=NaN(1,size(Y,2)) and not out=NaN(1,size(Y,1)).
Jan
Jan am 24 Feb. 2011
No, HISTC does not work for overlapping bins. Therefore I split the overlapping intervals to non-overlappings ones and add the contents of the separate bins such, that the results equal the overlapping bins. Example: n=HISTC(X, [0,100,200,300]) => n=[1x4]. Now the number of elements in 0:200 is n(1)+n(2), and for 100:300 it is n(2)+n(3), or according to your data n(2)+n(3)+n(4). As long as all bins overlap pairwise, this method works.
Did you run my code?
s k
s k am 25 Feb. 2011
Ahh yes, I see, I did not notice the fact that you had changed the binwidth to 100. Yes this works, of course, for the question that I asked. Thanks !

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Bruno Luong
Bruno Luong am 24 Feb. 2011

1 Stimme

You might try this code using my mcolon function:
% Data
Y=[0 100 200 300 400;
200 300 400 500 600]
X= [101 202 303 505]
% Full vectorized Engine
lo = Y(1,:);
hi = Y(2,:);
nbin = size(lo,2);
[~, ilo] = histc(X, [lo Inf]);
[~, ihi] = histc(X, [-Inf hi]);
% Test if they belong to the bracket
tf = ilo & ihi & (ilo >= ihi);
left = ihi(tf);
right = ilo(tf);
loc = mcolon(left,right); % FEX
count = accumarray(loc(:),1,[nbin 1])'
Bin belonging follows closed-left/open-right bracket convention. Reverse the sign of X, Y if you prefer the opposite.

2 Kommentare

s k
s k am 25 Feb. 2011
This seems like the more generic answer that I was looking for, since it looks like it works for arbitrary bins (not all of the same binwidth, etc) !! I have to study it a bit to figure out what it is doing.
Bruno Luong
Bruno Luong am 25 Feb. 2011
I never see the same bin-width, or pair-wise overlapping has been specified in the question. It just shows as such in the example.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Data Distribution Plots finden Sie in Hilfe-Center und File Exchange

Gefragt:

s k
am 24 Feb. 2011

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by