# Improving the speed of a double for loop through vectorization or changing summation indices

3 views (last 30 days)
Matthew Kehoe on 15 Jul 2021
Edited: Matthew Kehoe on 22 Jul 2021
I'm writing a double for loop that computes a Pade summation subject to specific coefficients and parameters. I'm curious if there is a more efficient way to write my double for loop. I think that I should avoid writing both cos(theta)^(p-q) and sin(theta)^q as the power operation is expensive.
Here is my current implementation (and three other implementations that are slightly faster).
clc; clear all;
% Test data for function
epsilon = 0.3;
delta = 0.1;
N = 500;
M = 500;
% My code passes through complex coefficients of size (N+1,M+1)
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
% Form the ctilde
N_M_min = min(N,M);
coeff = zeros(N_M_min+1,1);
coeff2 = zeros(N_M_min+1,1);
coeff3 = zeros(N_M_min+1,1);
coeff4 = zeros(N_M_min+1,1);
ntests = 100; % number of test iterations to average exec time
% Method 1 - Original Implementation
tic
for N = 1:ntests
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;
end
end
end
toc/ntests
% Method 2
tic
for N = 1:ntests
for p=0:N_M_min
c2 = 1;
for q=0:p
if (q == 0)
c2 = 1;
else
c2 = c2*sin(theta);
end
coeff2(p+1) = coeff2(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*c2;
end
end
end
toc/ntests
% Method 3
tic
c1 = cos(theta).^(0:N_M_min);
c2 = sin(theta).^(0:N_M_min);
for N = 1:ntests
for p=0:N_M_min
for q=0:p
coeff3(p+1) = coeff3(p+1) + c(p-q+1,q+1)*c1(p-q+1)*c2(q+1);
end
end
end
toc/ntests
% Method 4 - Fastest but slightly less accurate than method 3
tic
for N = 1:ntests
vander = (cos(theta).^(0:N_M_min).') .* (sin(theta).^(0:N_M_min));
for p=0:N_M_min
for q=0:p
coeff4(p+1) = coeff4(p+1) + c(p-q+1,q+1)*vander(p-q+1,q+1);
end
end
end
toc/ntests
diff = norm(coeff-coeff2,inf); % is tiny, 3.3e-21
diff2 = norm(coeff-coeff3,inf); % is zero
diff3 = norm(coeff-coeff4,inf); % is tiny 3.5e-15 (not zero as in method 3)
My second implementation runs almost twice as fast locally. The third and fourth implementation increase the speed by around a factor of 4.
The average executive time for method 1 is 0.0589 seconds.
The average executive time for method 2 is 0.0392 seconds.
The average executive time for method 3 is 0.0186 seconds.
The average executive time for method 4 is 0.0116 seconds.
However, I don't know how to speed up the calculation for the coefficients cos(theta)^(p-q). Is there a more efficient way of writing this double for loop through avoiding the power operation or using vectorization (my Matlab code will call this subroutine more than one hundred thousand times)?
##### 0 CommentsShowHide -1 older comments

Sign in to comment.

### Answers (1)

David Hill on 15 Jul 2021
Edited: David Hill on 15 Jul 2021
What is rep doing? Seems to just be multiplying.
for rep = 1:10000
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;%there is no rep in the equations
end
end
end
I believe all your nested loops can be consolidated as such:
epsilon = 0.3;
delta = 0.1;
N = 50;
M = 50;
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
N_M_min = min(N,M);
tic;
c=rot90(c);
coeff=10000*arrayfun(@(x)sum(diag(c,x-N_M_min)'.*cos(theta).^(x:-1:0).*sin(theta).^(0:x)),0:N_M_min);
toc;
This runs in about 0.003 s.
##### 1 CommentShowHide None
Matthew Kehoe on 15 Jul 2021
Edited: Matthew Kehoe on 22 Jul 2021
I only included rep for testing purposes (I shouldn't have included it and have edited my original post). The 51 coefficient values are different as
clear all;
clc;
epsilon = 0.3;
delta = 0.1;
N = 50;
M = 50;
c = rand(N+1,M+1) + 1i*rand(N+1,M+1);
rho = sqrt(epsilon^2 + delta^2);
theta = atan2(delta,epsilon);
N_M_min = min(N,M);
coeff = zeros(N_M_min+1,1);
% Method 1 - Original Matlab code
tic
for p=0:N_M_min
for q=0:p
coeff(p+1) = coeff(p+1) + c(p-q+1,q+1)*cos(theta)^(p-q)*sin(theta)^q;
end
end
toc
% Method 2
tic
c=rot90(c);
coeff2=arrayfun(@(x)sum(diag(c,x-N_M_min)'.*cos(theta).^(x:-1:0).*sin(theta).^(0:x)),0:N_M_min);
coeff2=coeff2';
toc
diff = norm(coeff - coeff2,inf); % returns 0.6371
% The average execution time for method 1 is 0.088334 seconds.
% The average execution time for method 1 is 0.305050 seconds.
It appears that calling arrayfun() is slower than writing for loops.

Sign in to comment.

### Categories

Find more on Matrix Indexing in Help Center and File Exchange

R2019a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by