Fast way to perform multiple searches on a large array
Ältere Kommentare anzeigen
I have a large time series array (10,000,000 elements) :
ts = [2; 1; 3; 4; 6; 7; .......]
I have a corresponding time array (same size as the above) :
times = [d1; d2; d3; d4; d5.......]
I have 2 arrays of start times and end times (also large ~ 30000 elements):
st = [dd1 dd2 dd3 ....]
en = [de1 de2 de3 ....]
I need to create a new matrix with many many finds. Logic is :
results = NaN(300, numel(st));
for i=1:numel(st);
temp = ts(find(times > st(i) & times < en(i) , 300,'first');
results(:,i) = temp;
end;
Is there any ay I do this faster (ideally without a loop) ?
- I have a 64 bit version so I can try a large in-memory solution.
Many thanks in advance, Nigel
8 Kommentare
Jan
am 4 Okt. 2011
I assume "time" should be "times" inside the loop.
Nigel
am 4 Okt. 2011
Andrei Bobrov
am 4 Okt. 2011
What size of 'st'?
Nigel
am 4 Okt. 2011
Jan
am 4 Okt. 2011
Do the intervals [st(i):en(i)] overlap?
Nigel
am 4 Okt. 2011
Daniel Shub
am 4 Okt. 2011
Just to confirm times, st and en are all sorted?
Nigel
am 4 Okt. 2011
Akzeptierte Antwort
Weitere Antworten (2)
Jan
am 4 Okt. 2011
Never let an array grow in each iteration! Pre-allocate the output:
results = NaN(300, numel(st));
for i = 1:numel(st) % Not size(st), which is a vector!
temp = ts(find(times > st(i) & times < en(i), 300, 'first');
if length(temp) == 300
results(:, i) = temp;
else
results(1:length(temp), i) = temp;
end
end
results = results(~isnan(results));
If st and times are sorted, it wastes a lot of time to compare all values. But for vectorizing this, a very large matrix would be needed, such that I assume it will be slower than the loop.
Can you solve the problem by using HISTC?
6 Kommentare
Nigel
am 4 Okt. 2011
Jan
am 4 Okt. 2011
Well, then this answer wasted your and my time. Please include the pre-allocation and all other relevant details, if you post a code-snippet and ask for a speed improvement. Otherwise the trials to optimize your code start from a very unrealistic point.
Nigel
am 4 Okt. 2011
Teja Muppirala
am 4 Okt. 2011
Since you say you know it will always have 300 entries, then this:
find(times > st(i) & times < en(i), 300, 'first');
May as well be this:
find(times > st(i), 300, 'first');
Daniel Shub
am 4 Okt. 2011
and since times and st are sorted
0:299+find(times > st(i), 1, 'first')
Nigel
am 4 Okt. 2011
Nigel
am 4 Okt. 2011
0 Stimmen
2 Kommentare
Bjorn Gustavsson
am 10 Okt. 2011
Well then at least do the consequtive 'find's on shortened sections of times (with 'offset' as in Daniel's example):
idx = find(times(offset:end) > st(i), 1,'first');
Then you'd get the benefit from increasingly shorter arrays to search over but without loosing the data.
Daniel Shub
am 10 Okt. 2011
I wonder if this would be faster. I would hope MATLAB is smart enough not to have to reallocate memory for my method. Yours is probably a little safer. I was also thinking that working from the end backwards might ultimately be the fastest.
Kategorien
Mehr zu Operators and Elementary Operations finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!