How to quickly group numerical data without giving bin sizes
9 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Dominik Rhiem
am 15 Aug. 2023
Kommentiert: Star Strider
am 16 Aug. 2023
I am trying to find an efficient and quick way to group numerical data. In short, I have several paths towards a particular pixel, and these paths consist of rays of slightly different lengths (as any ray that crosses the pixel anywhere is valid for a path). These paths can therefore be considered groups of rays. I want to differentiate the paths by their (average) length and select the path that contains the largest amount of rays, or, in other words, identify the groups and select the largest group.
Importantly though, I do not just need the length, but also an index to identify one ray, e.g. the "middle" one of the group. (Say I have an array of size 10, and the first 7 and last 3 elements form 2 groups. I would like to identify the groups, then, out of the 7 elements of the larger group, I would like to get the index of the 4th element as the "middle".)
My current solution is to round the ray lengths (to third decimal, as the pixel size is on the millimeter scale) and use the "mode" function, however, this is both inefficient (because I want to do this column-wise for a matrix that also contains NaN that I would like to ignore) and in some cases inaccurate. For example:
array = [0.2248 0.2249 0.2250 0.2251 0.2399 0.2400 0.2401];
array2 = round(array,2);
mode(array2)
Of course it would be logical to group the first four entries and the last three, but the rounding operation is ill-suited when the values vary around the .5. I have used to Histogram function to plot examples in my code and it groups the entries in a satisfactory way, however, I actively do not want to have the plot itself, I just need the grouping, and the histogram function seems to have a rather large overhead for this purpose (as this operation has to be performed thousands of times for a proper run of the program). The discretize function unfortunately needs me to give it an explicit number of bins, i.e. I would need to have an a priori idea of the groups.
Is there any function that can efficiently do this, or are there suggestions for a better way to do it myself than "mode"?
0 Kommentare
Akzeptierte Antwort
Star Strider
am 16 Aug. 2023
I am not certain that there is a robust approach to this sorts of problem. For multivariable problems (each point is a vector determined by more than one value), there are built-in clustering functions. This is a bit unique.
The data ideally need to be ordered (although that may not be an absolute requirement), the reason being that it is easier to calculate the differences if they are. This approach may be too much for this particular problem, however I decided to make it a bit more robust and so be appropriate for other problems, although I cannot be ceertain it will be robust for all such problems, and may need tweaking in some instances.
Try this —
array = [0.2248 0.2249 0.2250 0.2251 0.2399 0.2400 0.2401];
% array = [array array+0.51] % Test Vector
DifMtx = abs(array(:)-array) % Difference MAtrix
[Col1,ixs] = sort(DifMtx(:,1)); % First Column & Inmdices
Col1Dif = diff([0; Col1]); % Ordered Column Differences
BP = [1; find(Col1Dif >= 5*min(Col1Dif(Col1Dif>0))); numel(Col1)+1]; % Break Points
for k = 1:numel(BP)-1
idxrng = BP(k) : BP(k+1)-1;
Cluster{k} = array(idxrng);
end
figure
hold on
for k = 1:numel(Cluster)
stem(Cluster{k}, ones(size(Cluster{k})), '.', 'filled', 'DisplayName',["Cluster #"+k])
end
hold off
grid
xlim([0.22 0.245]) % Optional
ylim([0 2])
legend('Location','best')
xlabel('Array')
title('Clusters')
The ‘ixs’ vector indexes into the original ‘Col1’ vector (and the original ‘array’ vector) if that information is needed.
.
2 Kommentare
Star Strider
am 16 Aug. 2023
As always, my pleasure!
I did my best to make it as robust as I could, however if you encounter a vector in which it has problems, post back and I will see if I can improve it to make it work with the new vector.
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Histograms finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!