Massive time required for pdist

5 Ansichten (letzte 30 Tage)
Sebastian Stumpf
Sebastian Stumpf am 4 Okt. 2021
Hello,
I am using the Matlab function pdist to calculate the distance between two points. However, I noticed that the function needs a lot of time, despite it is using all four cores. I build this example to demonstrate the massive time comsumption. If I calculate the distance between two points with my own code, it is much faster. The example calculates the distance between a thousand points.
clear
close all
clc
tic
j=1;
X = rand(1000,2);
Y = rand(1000,2);
fprintf('Time for array creation: ');
toc
tic
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
A(j,1) =sqrt((Y(i,1)-X(k,1))^2 + (Y(i,2)-X(k,2))^2);
j = j+1;
end
end
fprintf('Time for own distance calculation: ');
toc
j = 1;
tic
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
P = [Y(i,1),Y(i,2);X(k,1),X(k,2)];
B(j,1) = pdist(P,'euclidean');
j = j+1;
end
end
fprintf('Time for distance calculation using Matlab function pdist: ');
toc
Output:
Time for array creation: Elapsed time is 0.000386 seconds.
Time for own distance calculation: Elapsed time is 0.251026 seconds.
Time for distance calculation using Matlab function pdist: Elapsed time is 10.776532 seconds.
You can clearly see, that the Matlab function pdist takes over 10 seconds longer.
My question is: Why? What else is this function doing?
Would be nice to know.
Thank you very much
Kind regards,
Sebastian

Akzeptierte Antwort

Chunru
Chunru am 4 Okt. 2021
Bearbeitet: Chunru am 4 Okt. 2021
%tic
X = rand(1000,2);
Y = rand(1000,2);
% fprintf('Time for array creation: ');
%toc
%% Version 1
tic
j=1;
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
A(j,1) =sqrt((Y(i,1)-X(k,1))^2 + (Y(i,2)-X(k,2))^2);
j = j+1;
end
end
size(A)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for own distance calculation: %.6f\n', t);
Time for own distance calculation: 0.307268
%% Version 1.1
% Pre-allocate A
tic
j=1;
A = inf(size(X,1)*size(Y,1), 1);
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
A(j,1) =sqrt((Y(i,1)-X(k,1))^2 + (Y(i,2)-X(k,2))^2);
j = j+1;
end
end
size(A)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for own distance calculation with preallocation: %.6f\n', t);
Time for own distance calculation with preallocation: 0.112437
%% Version 2
tic
j=1;
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
P = [Y(i,1),Y(i,2);X(k,1),X(k,2)];
B(j,1) = pdist(P,'euclidean'); % one pair
j = j+1;
end
end
size(B)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for distance calculation using Matlab function pdist: %.6f\n', t);
Time for distance calculation using Matlab function pdist: 15.181589
%% Version 2.1
% Pre-allocate B before hand
tic
j=1;
B = inf(size(X,1)*size(Y,1), 1);
for i = 1:1:size(Y,1)
for k = 1:1:size(X,1)
P = [Y(i,1),Y(i,2);X(k,1),X(k,2)];
B(j,1) = pdist(P,'euclidean');
j = j+1;
end
end
size(B)
ans = 1×2
1000000 1
t = toc;
fprintf('Time for distance calculation using Matlab function pdist: %.6f\n', t);
Time for distance calculation using Matlab function pdist: 12.980660
%% Version 3
% pdist of many points (this compute distance x2-x1, x3-x1, ... x1000-x1,
% y1-x1, ..., y10001; x3-x2, ..., x1000-x2, ..., y1000-x2 etc
% doc pdist
tic
p = pdist([X; Y]); % dist
size(p)
ans = 1×2
1 1999000
t = toc;
fprintf('Time for distance calculation using Matlab function pdist (many points): %.6f\n', t);
Time for distance calculation using Matlab function pdist (many points): 0.016222
  1 Kommentar
Sebastian Stumpf
Sebastian Stumpf am 6 Okt. 2021
Thank you for your detailed answer. It looks like I didn't use the function very efficently.
Kind regards

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Statistics and Machine Learning Toolbox finden Sie in Help Center und File Exchange

Tags

Produkte


Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by