Fast way to perform multiple searches on a large array

Question

0 Stimmen

I have a large time series array (10,000,000 elements) :

ts = [2; 1; 3; 4; 6; 7; .......]

I have a corresponding time array (same size as the above) :

times = [d1; d2; d3; d4; d5.......]

I have 2 arrays of start times and end times (also large ~ 30000 elements):

st = [dd1 dd2 dd3 ....]
en = [de1 de2 de3 ....]

I need to create a new matrix with many many finds. Logic is :

results = NaN(300, numel(st));
for i=1:numel(st);
  temp = ts(find(times > st(i) & times < en(i) , 300,'first');
  results(:,i) = temp;
end;

Is there any ay I do this faster (ideally without a loop) ?

I have a 64 bit version so I can try a large in-memory solution.

Many thanks in advance, Nigel

8 Kommentare
6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

Daniel Shub am 4 Okt. 2011

Just to confirm times, st and en are all sorted?

Nigel am 4 Okt. 2011

Yes they are sorted by st and en(i)-st(i) = 300 seconds

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Daniel Shub am 4 Okt. 2011

In MATLAB Online öffnen

0 Stimmen

I think by dumping the past times you might be able to speed up the find. If st(i+1) > en(i), then you could dump even more elements, but I think the savings will be small. This code relies on times, st, and en being sorted.

results = NaN(300, numel(st));
offset = 0;
for i=1:numel(st);
  idx = find(times > st(i), 1,'first');
  offset = offset+idx-1;
  times = times(idx:end);
  results(:,i) = ts(0:299+idx+offset);
end

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Nigel am 10 Okt. 2011

Hi Daniel,

I used a modified version of your solution. Indeed it is a LOT quicker to search over smaller sized arrays.

Thank you all for your help.

N.

Melden Sie sich an, um zu kommentieren.

Answer 2

Jan am 4 Okt. 2011

In MATLAB Online öffnen

0 Stimmen

Never let an array grow in each iteration! Pre-allocate the output:

results = NaN(300, numel(st));
for i = 1:numel(st)   % Not size(st), which is a vector!
  temp = ts(find(times > st(i) & times < en(i), 300, 'first');
  if length(temp) == 300
    results(:, i) = temp;
  else
    results(1:length(temp), i) = temp;
  end
end
results = results(~isnan(results));

If st and times are sorted, it wastes a lot of time to compare all values. But for vectorizing this, a very large matrix would be needed, such that I assume it will be slower than the loop.

Can you solve the problem by using HISTC?

6 Kommentare
4 ältere Kommentare anzeigen 4 ältere Kommentare ausblenden

Daniel Shub am 4 Okt. 2011

and since times and st are sorted

0:299+find(times > st(i), 1, 'first')

Nigel am 4 Okt. 2011

WOW by removing the < en(i)the processing time nearly halved !!

Melden Sie sich an, um zu kommentieren.

Answer 3

Nigel am 4 Okt. 2011

0 Stimmen

Certainly taking away the < en(i) helped. I'm a little hesitant to implement the dumping the past times part because I need the data for something a little later on.

Just for my own learning I would really like to know how could I vectorise this operation such that I didn't need to do this in a loop.

Thank you all once again for taking the time to look at and respond to my question.

N.

2 Kommentare
Keine anzeigen Keine ausblenden

Bjorn Gustavsson am 10 Okt. 2011

Well then at least do the consequtive 'find's on shortened sections of times (with 'offset' as in Daniel's example):

idx = find(times(offset:end) > st(i), 1,'first');

Then you'd get the benefit from increasingly shorter arrays to search over but without loosing the data.

Daniel Shub am 10 Okt. 2011

I wonder if this would be faster. I would hope MATLAB is smart enough not to have to reallocate memory for my method. Yours is probably a little safer. I was also thinking that working from the end backwards might ultimately be the fastest.

Melden Sie sich an, um zu kommentieren.

Fast way to perform multiple searches on a large array

8 Kommentare
6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (2)

6 Kommentare
4 ältere Kommentare anzeigen 4 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden

Kategorien

Tags

Community Treasure Hunt

Fast way to perform multiple searches on a large array

8 Kommentare 6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (2)

6 Kommentare 4 ältere Kommentare anzeigen 4 ältere Kommentare ausblenden

2 Kommentare Keine anzeigen Keine ausblenden

Kategorien

Tags

Siehe auch

Community Treasure Hunt

8 Kommentare
6 ältere Kommentare anzeigen 6 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

6 Kommentare
4 ältere Kommentare anzeigen 4 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden