Euclidean distance (ED)calculation in matlab

Question

TUSHAR MURATKAR am 11 Sep. 2017

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/356103-euclidean-distance-ed-calculation-in-matlab

Kommentiert: Jan am 16 Sep. 2017

In my program, I have a matrix obtained after lexicographic sorting. Its dimensions are 347275x64 double. My question is "how we can find euclidean distances (ED) between pairs of rows in this matrix?" Also, in my research paper, they specify that we don't need to find the ED between all the pairs of rows, but only for those pairs of rows which are close to offset threshold. So how to decide this value of offset threshold? Can anyone share his / her knowledge about this?

Please help me in this euclidean distance problem.

Thanks in advance.

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Walter Roberson am 12 Sep. 2017

"If it's done on the ED, then how can you know which pairs are "close" until you've done them all?"

KDTree type searches and quadtree and octree type searches can be more efficient than exhaustive search for this purpose; https://en.wikipedia.org/wiki/K-d_tree#Complexity . For KDTree the building cost is O(n * log(n)) or sometimes less; each query is O(log n)

TUSHAR MURATKAR am 14 Sep. 2017

@image analyst, I am following a published paper and in that they didn't mentioned any thing about offset threshold except this formula. The formula is " abs[index(u)-index(v)] ≤ Offset Threshold. Can you help me with this.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

John BG am 12 Sep. 2017

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/356103-euclidean-distance-ed-calculation-in-matlab#answer_281222

Bearbeitet: John BG am 13 Sep. 2017

In MATLAB Online öffnen

Hi TUSHAR MURATKAR

this is John BG <mailto:jgb2012@sky.com jgb2012@sky.com>

I have written a scaled solution with only 12 rows

1.

the Euclidean distance function you need can be written in different ways, one of them

distn=@(a,k,p) (sum((a(k,:)-a(p,:)).^2))^.5

testing the distance function

   A=randi([-10 10],12,5);
  distn =
    function_handle with value:
      @(a,k,p)(sum((a(k,:)-a(p,:)).^2))^.5
   distn(A,1,2)
  ans =
     20.4450
   distn(A,3,7)
  ans =
     17.8326

2.

Now, how many combinations without repetition pairs do you have?

    A=randi([-10 10],12,5);
    L=nchoosek([1:1:12],2);

here A is a the test matrix, replace A with your input matrix.

3.

Applying the offset threshold

DE=[0 0 0];
off_thresh=10;
for k=1:1:length(L)
    if distn(A,L(k,1),L(k,2))>off_thresh
        DE=[DE;L(k,1) L(k,2) distn(A,L(k,1),L(k,2))];
    end
end
DE(1,:)=[];

4.

The result is DE, with 1st and 2nd columns defining the pair of rows and the 3rd column is the Euclidean distance.

DE =        
0000    2.0000   20.4450
0000    3.0000   13.8203
0000    4.0000   17.3781
0000    5.0000   22.0454
0000    6.0000   16.8226
0000    7.0000   22.6053
0000    8.0000   23.7908
..
0000   10.0000   17.0587
0000   11.0000   19.7231
0000   12.0000   14.3178
0000   11.0000   24.2487
0000   12.0000   18.8149
0000   12.0000   23.7065

5.

note that the higher the offset threshold off_thresh is defined, the fewer the amount of pairs collected in DE, which is consistent.

If you find this answer useful would you please be so kind to consider marking my answer as Accepted Answer?

To any other reader, if you find this answer useful please consider clicking on the thumbs-up vote link

thanks in advance

John BG

<mailto:jgb2012@sky.com jgb2012@sky.com>

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

TUSHAR MURATKAR am 12 Sep. 2017

In MATLAB Online öffnen

distn=@(a,k,p) (sum((a(k,:)-a(p,:)).^2))^.5;
L=nchoosek({1:1:256},2);
DE=zeros(1,256);
off_thresh=10;
for k=1:1:length(L)
    if distn(A,L(k,1),L(k,2))>off_thresh
        DE=[DE;L(k,1) L(k,2) distn(A,L(k,1),L(k,2))];
    end
end
DE(1,:)=[];

I tried your code with some modifications but i got "DE" as 0 i.e. 0x256. Pls can you suggest the correct changes. Thanks in advance.

Jan am 15 Sep. 2017

Bearbeitet: Jan am 15 Sep. 2017

@TUSHAR: Accepting an answer means, that the problem is solved. Afterwards the chances are lower that your thread is read. Please select the answer as accepted, which solved your problem.

While creating the output by an iterative growing works for an input with 256 elements, it will run for a very very long time for 347275 elements. The "DE=[DE; ...]" approach reserves a completely new array in each iteration, such that if the threshold let 1e5 elements pass the test (a guess), Matlab needs to allocate sum(1:1e5)*8 bytes = 40 GB, although the final result uses 800 kB only.

General rules:

Do not let arrays grow iteratively.
A function which works for a tiny data set need not be useful for a large data set, if combinations or permutations are part of the problem.

Melden Sie sich an, um zu kommentieren.

Answer 2

Walter Roberson am 12 Sep. 2017

3
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/356103-euclidean-distance-ed-calculation-in-matlab#answer_281189

See rangesearch() from the Statistics toolbox.

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

TUSHAR MURATKAR am 12 Sep. 2017

@ walter... ok, but i am not getting it. can you give small example. And how i will select the value o threshold.

Walter Roberson am 12 Sep. 2017

The rangesearch documentation has an example of searching by radius with a KDTree; https://www.mathworks.com/help/stats/searcher.rangesearch.html#bui0h43-1

Melden Sie sich an, um zu kommentieren.

Answer 3

Jan am 15 Sep. 2017

3
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/356103-euclidean-distance-ed-calculation-in-matlab#answer_281678

Bearbeitet: Jan am 15 Sep. 2017

In MATLAB Online öffnen

Data   = rand(347275, 64) * 1000;
nData  = size(Data, 1);
L      = nchoosek(1:nData, 2);   % Not "{1:1:nData}", but a vector
DE     = zeros(2, size(L, 1));   % Pre-allocate
iDE    = 0;
thresh = 10^2;               % Squared distance
for k = 1:size(L, 2)         % SIZE(L, 2) is safer than LENGTH(L)
  % Using the function directly is much faster than an anonymous function.
  % Calculate the expensive SQRT on demand only:
  D2 = sum((Data(L(k,1), :) - Data(L(k,2), :)) .^ 2);
  if D2 < thresh
    iDE        = iDE + 1;
    DE(1, iDE) = sqrt(D2);   % Store value and index
    DE(2, iDE) = k; 
  end
end
DE = DE(:, 1:iDE);           % Crop unused pre-allocated data

Now DE(1, :) contains the distances and L(DE(2, :)) the corresponding indices.

Note that nchoosek(1:347273, 2) has 6.03e10*2 elements. Matlab's nchoosek stores this in a double matrix, such that you need 964.8 GB RAM for this matrix. Using the faster FEX: VChooseK can reduce this by 50% using UINT32 indices:

L = VChooseK(uint32(1):uint32(nData), 2);

But a half TB is still a lot of free RAM. It will be cheaper to create the indices dynamically, but the pre-allocation of the output remains a problem.

% [EDITED, UTC 15:42 15-Sep-2017, Typos fixed, indices fixed, transposed Data]
Data   = transpose(Data);     % Faster to access 1st dimension
nData  = size(Data, 2);
nOut   = 1e6;                 % A guess
DE     = zeros(3, nOut);      % Pre-allocate the output
iDE    = 0;
thresh = 10^2;                % Squared distance
L      = nchoosek(nData, 2);  % Number of outputs, not an array
k1     = 1;
k2     = 2;
for k = 1:L
  D2 = sum((Data(k1, :) - Data(k2, :)) .^ 2);
  if D2 < thresh
    iDE = iDE + 1;
    if iDE > nOut              % Expand the array, a re-allocation
      nOut = nOut + 1e6;
      DE(:, nOut) = 0;
    end
    DE(:, iDE) = [sqrt(D2), k1, k2];
  end
    % Get new indices:
    k2 = k2 + 1;
    if k2 > nData
      k1 = k1 + 1;
      k2 = k1 + 1;
    end
  end
  DE = DE(:, 1:iDE);           % Crop unused pre-allocated data

This is still a huge problem and needs about 2.5 days to run. It can be accelerated by a vectorization, but I do not have more time currently to implement this. Walter's suggestion rangesearch() seems to be more promising.

[EDITED] A vectorized and leaner version:

function Near = GetNearPoints(X, Thresh)
% Input:  X:      [nPoint x m] matrix, nPoints with m dimensions
%         Thresh: Positive value
% Output: Near:   [nOut x 3], The indices of the rows and the distance
% Author: Jan Simon, License: CC BY-SA 3.0
D  = transpose(X);   
n  = size(D, 2);
C  = cell(1, n);      % Collect output
T2 = Thresh ^ 2;      % Squared threshold
for k = 1:n
   D2 = sum(bsxfun(@minus, D(:, k), D(:, k+1:n)) .^ 2, 1);
   m  = (D2 < T2);  % Compare squared distances
   if any(m)
      C{k} = [repmat(k, sum(m), 1), find(m).' + k, sqrt(D2(m)).'];
   end
end
Near = cat(1, C{:});

This is a little bit faster. A parfor will be useful, if you have the parallel processing toolbox. With 8 cores, this might take some hours only. This brute force method cannot be much faster.

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Stephen23 am 15 Sep. 2017

@TUSHAR MURATKAR: you can thank Jan Simon by accepting the answer that actually solves your question.

Jan am 16 Sep. 2017

I do not think that this brute force method solves the problem efficiently, but Walter's suggestion is much better.

A divide and conquer approach will be smarter also: Searching the complete data set requires nchoosek(347275, 2) = 60.3e9 comparisons. If the volume is split into 2 halves (and considering the an extra interval with the width of the threshold), reduces the problem to 2*nchoosek(347275, 2) + X = 30.1e9 comparisons (plus the small overhead for the margin). This will reduce the processing speed by nearly 50% - and this can be applied repeatedly in X and Y direction.

If this strategy is applied strictly, you come to a k-D tree approach, see e.g. https://www.mathworks.com/matlabcentral/fileexchange/4586-k-d-tree. Unfortunately I cannot get this code to work...

Melden Sie sich an, um zu kommentieren.

Euclidean distance (ED)calculation in matlab

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Akzeptierte Antwort

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (2)

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

Euclidean distance (ED)calculation in matlab

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Akzeptierte Antwort

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Weitere Antworten (2)

4 Kommentare 2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden