Fastest way to index large arrays

30 Ansichten (letzte 30 Tage)
Vittorio Picco
Vittorio Picco am 4 Okt. 2022
Kommentiert: dpb am 5 Okt. 2022
I have two sets of arrays, A and B. The "A" arrays have about 1 million elements. The "B" arrays have about 65 thousand elements. For every element in A I need to find the corresponding element in B and pull a related value. Here's a crude minimal working example
PhiA = round(359*rand(1,1e6));
ThetaA = round(179*rand(1,1e6));
PhiB = repmat(0:359,1,180);
ThetaB = reshape(repmat(0:179,360,1),1,[]);
VarB = 1:180*360;
out = nan(1e6,1);
tic
for loop = 1:length(PhiA)
idx = PhiA(loop) == PhiB & ThetaA(loop) == ThetaB;
out(loop) = VarB(idx);
end
toc
Given the size of the arrays this is not very fast, over 40 seconds on my machine. The profiler tells me that those two lines in the for loop are the slowest in my code, and surprisingly they split the burden almost exactly 50/50.
This is actually my already faster version: originally A and B were tables and the profiler told me that the slow operations were accessing and storing into the tables. Switching to arrays has sped up things a little but not as much as I hoped.
How could I make this faster?

Akzeptierte Antwort

dpb
dpb am 4 Okt. 2022
With the lookup arrays structured as they are, you don't need a lookup at all; you can just calculate the row directly --
fnRow=@(phi,theta)phi+360*theta+1;
so, with this,
PhiA = round(359*rand(1,1e6));
ThetaA = round(179*rand(1,1e6));
PhiB = repmat(0:359,1,180);
ThetaB = reshape(repmat(0:179,360,1),1,[]);
VarB = 1:180*360;
tic
out=VarB(fnRow(PhiA,ThetaA));
toc
Elapsed time is 0.012699 seconds.
  4 Kommentare
Vittorio Picco
Vittorio Picco am 5 Okt. 2022
Yeah, it worked out. I can round to make integers so that's not a problem. The problem was that the array A has occasionally NaN, which are entries I need to skip, but that made the last line fail. The way I dealt with it was by appending a dummy value to the end of the VarB array, and by replacing the NaN with this new index; that made the out= assigment work. Then I replaced the dummy entries back with NaNs. All of that could be done without for loops so my execution time remained almost unaffected. I wonder how you would have dealt with it. I'm not good at anonymous functions so I never think about them.
dpb
dpb am 5 Okt. 2022
I probably would have simply used logical addressing in the calculation selection...
isOK=isfinite(all(A,2));
out=VarB(fnRow(PhiA(isOK),ThetaA(isOK)));
The above assumes the A array is the one of interest and checks that there are no missing lines.
If out must be the same size as A in the row dimension, then you would need to preallocate it to ensure it is that size; otherwise it will be only as large as the last non-missing element in A location. It only matters it the last N elements are those missing, but you may not have any way to know that isn't going to be the case so defensive coding would preallocate.
If the above is more like the way the code is constructed, then
isOK=isfinite(all([PhiA.' ThetaA.'],2));
looks ominous but will be fast and is easier to write than the two conditions on each vector with &

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Matrix Indexing finden Sie in Help Center und File Exchange

Produkte


Version

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by