clusterization of data in 1-D vector

3 Ansichten (letzte 30 Tage)
paganelle
paganelle am 28 Okt. 2020
Kommentiert: paganelle am 28 Okt. 2020
I have large logical vector looking as V = [0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 ..............]
I need to find the position of each group of 1 (lets say - center of each group) but if two groups of ones are too close to each other (say, less than 3 zerros in between) I need to consider those groups as a single group. I.e. at the firs stage I need to find groups (bold-underlined elements) and then find the ceter element of each group (shift +/-1 element does not matter)
1st stage (clusterization): [0 0 0 0 0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 1 0 0 0 0 ..............]
2nd stage (find a center of each cluster): [0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ..............]
The way I implemented now is following: I do smoothing of the entire vector (it is couple million elements). The span is chousen to be equal of maximum expected lenght of the group and then I look for local maxima (islocalmax) with 'MinSeparation' of minimum distace between groups. It works, but really slow (I have 360x180 = 64800 of vectors - yes, it is LAT/LONG grid with ~10M elements in each vector)
Is any way to speed up this? I believe it should be some "textbook" examples of it!

Akzeptierte Antwort

Adam Danz
Adam Danz am 28 Okt. 2020
Bearbeitet: Adam Danz am 28 Okt. 2020
There are lots of alternatives.
  • Input A is a vector of 1s and 0s.
  • n is minimum number of 0s between 1s separate groups of 1s.
  • T is a table showing the start and stop index for each consecutive group of 1s split by less than n zeros and the length of each group.
A = [0 0 0 1 1 1 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 1 1 0 1 0 1 1 0 1 0 0 0 0 1 1 1 1];
% Length of each group of consecutive 1s
T = table();
T.OnesLength = diff(find([0;A(:);0]==0))-1;
T(T.OnesLength==0,:) = [];
% Index of 1st '1' in each group of consecutive 1s
T.OnesStart = find(diff([0;A(:)])==1);
% Index of last '1' in each group of consecutive 1s
T.OnesStop = T.OnesStart + T.OnesLength - 1;
% Determine the number of 0s between consecutive 1s
ZerosBetween = [T.OnesStart(2:end) - T.OnesStop(1:end-1); NaN]-1;
disp(T)
OnesLength OnesStart OnesStop __________ _________ ________ 3 4 6 3 9 11 6 18 23 2 29 30 1 32 32 2 34 35 1 37 37 4 42 45
% join groups of consecutive 1s with less than n zeros between.
n = 3;
joinGroups = ZerosBetween < n;
t = find(diff([0;joinGroups])==1);
f = find(diff([0;joinGroups])==-1);
T.remove = false(height(T),1);
for i = 1:numel(t)
T.OnesStop(t(i)) = T.OnesStop(f(i));
T.OnesLength(t(i)) = sum(T.OnesLength(t(i):f(i))) + sum(ZerosBetween(t(i):f(i)-1));
T.remove(t(i)+1:f(i)) = true;
end
T(T.remove,:) = [];
T.remove = [];
disp(T)
OnesLength OnesStart OnesStop __________ _________ ________ 8 4 11 6 18 23 9 29 37 4 42 45
Now you can use the segment length and the start/stop indices to compute the segement centers.
  1 Kommentar
paganelle
paganelle am 28 Okt. 2020
Perfect way, thank you!
It is ~5 times faster than method I used previously.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (0)

Kategorien

Mehr zu Resizing and Reshaping Matrices finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by