# Identifying regions in matrix rows

12 Ansichten (letzte 30 Tage)
Dominik Rhiem am 18 Sep. 2023
Bearbeitet: Bruno Luong am 20 Sep. 2023
I have an M x N matrix that contains the indices of columns where an external "validity criterion" is fulfilled, and -1 otherwise. Example:
matrix = zeros (5,8) -1;
matrix(1,1:3) = 1:3;
matrix(3,2:3) = 2:3;
matrix(3,5:7) = 5:7;
matrix(4,2:5) = 2:5;
matrix
matrix = 5×8
1 2 3 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2 3 -1 5 6 7 -1 -1 2 3 4 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
Let's call the input above the "regions", i.e., here we have in total 4 regions (always considered row-wise). My objective is to identify, for each row, the largest region, and within that region the central value (rounded up). Currently, I do this as follows:
dummy = zeros(5,1) -1;
jumps = abs(diff([dummy, matrix, dummy], [], 2))
jumps = 5×9
2 1 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 1 4 6 1 1 8 0 0 3 1 1 1 6 0 0 0 0 0 0 0 0 0 0 0 0
[borders_row, borders_col] = find(jumps>1);
That way, I identify the borders of the regions. Afterwards, I go through each row individually to identify the largest region, as I mentioned, which I do as follows:
first_idx = zeros(1,5);
second_idx = false(1,5);
for i = 1:5
borders_cur = borders_col(borders_row==i).';
ranges = diff(borders_cur);
ranges = ranges(1:2:end);
[width, path_idx] = max(ranges);
idxx = borders_cur(path_idx)-1 + ceil(width/2);
if ~isempty(idxx)
first_idx(i) = idxx;
second_idx(i) = logical(idxx);
end
end
For each row, I consider the current borders and first calculate the distance between them, so that I can judge which region is larger if there is more than one. I skip every second value because those indicate the "invalid spaces" in between the regions. I then pick the bigger region and calculate its central index, which I then record in two arrays (first_idx and second_idx).
I want to get rid of the loop as it's computationally expensive, and generally optimise the code as it's part of a bigger loop, and this is one of the slowest parts of the code currently. Any suggestions?
##### 4 Kommentare2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden
Bruno Luong am 18 Sep. 2023
Bearbeitet: Bruno Luong am 18 Sep. 2023
IMO there is a bug in these three lines
ranges = ranges(1:2:end);
[width, path_idx] = max(ranges);
idxx = borders_cur(path_idx)-1 + ceil(width/2)
Your range is step 2, your path_idx is relative to then subarray of step2 so you should index with borders_cur(2*path_idx-1) in the last line.
Or alternatively you have to make borders_cur step 2 too.
ranges = ranges(1:2:end);
borders_cur = borders_cur(1:2:end); % missing
[width, path_idx] = max(ranges);
idxx = borders_cur(path_idx)-1 + ceil(width/2)
Dominik Rhiem am 19 Sep. 2023
@Bruno Luong Good catch! I have decided to use borders_cur(2*path_idx-1) since it is more clear, in my opinion.

Melden Sie sich an, um zu kommentieren.

### Akzeptierte Antwort

Bruno Luong am 18 Sep. 2023
Bearbeitet: Bruno Luong am 19 Sep. 2023
matrix = zeros (5,8) -1;
matrix(1,1:3) = 1:3;
matrix(3,2:3) = 2:3;
matrix(3,5:7) = 5:7;
matrix(4,2:5) = 2:5;
matrix
matrix = 5×8
1 2 3 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2 3 -1 5 6 7 -1 -1 2 3 4 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
dummy = zeros(5,1) -1;
jumps = abs(diff([dummy, matrix, dummy], [], 2));
[borders_row, borders_col] = find(jumps>1);
first_idx = zeros(1,5);
second_idx = false(1,5);
for i = 1:5
borders_cur = borders_col(borders_row==i).';
ranges = diff(borders_cur);
ranges = ranges(1:2:end);
[width, path_idx] = max(ranges);
idxx = borders_cur(2*path_idx-1)-1 + ceil(width/2); % modified by BLU
if ~isempty(idxx)
first_idx(i) = idxx;
second_idx(i) = logical(idxx);
end
end
first_idx
first_idx = 1×5
2 0 6 3 0
% my method
A = (matrix>0).';
n = size(A,2);
f = false(1, n);
[i,j] = find(diff([f; A; f]));
c = i(1:2:end);
w = i(2:2:end)-c;
r = j(1:2:end);
[rw,is] = sortrows([r,w],[1 -2]);
keep = [true; diff(rw(:,1))>0];
rw = rw(keep,:);
c = c(is(keep));
r = rw(:,1);
w = rw(:,2);
mididx = c-1+ceil(w/2);
first_idx = accumarray(r, mididx, [n,1])'
first_idx = 1×5
2 0 6 3 0
second_idx = logical(first_idx);
##### 3 Kommentare1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden
Dominik Rhiem am 19 Sep. 2023
I have been testing your suggestion, and I really like it. It's quick and produces the correct results. Thank you!
Bruno Luong am 19 Sep. 2023

Melden Sie sich an, um zu kommentieren.

### Weitere Antworten (4)

Jon am 18 Sep. 2023
Bearbeitet: Jon am 18 Sep. 2023
I wasn't completely clear from your description what output you wanted, but I think this is what you were looking for. Still loops through rows of M, but maybe more efficient that what you had tried. Would have to time it to see.
% Build example input matrix
M = zeros (5,8) -1;
M(1,1:3) = 1:3;
M(3,2:3) = 2:3;
M(3,5:7) = 5:7;
M(4,2:5) = 2:5;
M
M = 5×8
1 2 3 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2 3 -1 5 6 7 -1 -1 2 3 4 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
% Make matrix that has ones where there are values and 0 otherwise
V = M ~= -1;
% Loop through rows finding midpoint values of longest blocks
[m,n] = size(V);
midVals = NaN(m,1); % preallocate array to hold middle region values
for k = 1:m
% Find indices of beginning and end of each block of values
idx = [true,diff(V(k,:))~=0,true];
% Find length of each block
blklngth = diff(find(idx));
% Make a vector whose elements give length of the block that each
% element in original matrix belongs to
x = repelem(blklngth,blklngth).*V(k,:);
% Find the index of the middle (rounding up) of the longest block
% and assign middle value to output vector
maxBlkLngth = max(x);
if maxBlkLngth > 0
idxMid = round(mean(find(x == max(x))));
midVals(k) = M(k,idxMid);
end
end
midVals
midVals = 5×1
2 NaN 6 4 NaN
##### 12 Kommentare10 ältere Kommentare anzeigen10 ältere Kommentare ausblenden
Bruno Luong am 20 Sep. 2023
Bearbeitet: Bruno Luong am 20 Sep. 2023
But I change V to double just before find (you don't need the original V afterwards).
V metamorphoses and has three lifes
Jon am 20 Sep. 2023
Sorry, Bruno, I didn't read your modified file carefully enough. Also missed that by reassigning V to diff([false;V(:)]) V has now become a double vector. So now I see that as a double we can assign the block lengths to it, and because it is a vector we must later reshape it. That's great! Thanks so much. Your code is always inspiring.

Melden Sie sich an, um zu kommentieren.

Mathieu NOE am 18 Sep. 2023
hello
I tried to put some code together and ended with that ; maybe interesting (?)
it will detect the largest region for each row and store (and display - see red crosses ) the center position of each region
I haven't rounded these values so add it if you need it
matrix = zeros (5,8) -1;
matrix(1,1:3) = 1:3;
matrix(3,2:3) = 2:3;
matrix(3,5:7) = 5:7;
matrix(4,2:5) = 2:5;
matrix
matrix = 5×8
1 2 3 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 2 3 -1 5 6 7 -1 -1 2 3 4 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
%% main code
mat = (matrix>0); % creat logical array
for k = 1:5
[begin,ends] = find_start_end_group(mat(k,:));
ll = ends-begin+1;
if isempty(ll)
index_pos(k) = NaN;
else
if numel(ll)>1 % more than one region in this row
[v,id] = max(ll);
index_pos(k) = 0.5*(begin(id)+ends(id)); % % center position (index) of largest region
else
index_pos(k) = 0.5*(begin+ends); % % center position (index) of largest region
end
end
end
% plot
cmap = [0 0 0; parula(128)];
matrix(matrix<0) = NaN;
imagesc(matrix)
colormap(cmap);
caxis([-1 10]);
colorbar('vert');
hold on
for k = 1:5
plot(index_pos(k),k,'r+','markersize',25);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [begin,ends] = find_start_end_group(ind)
% This locates the beginning /ending points of data groups
% Important : ind must be a LOGICAL array
D = diff([0;ind(:);0]);
begin = find(D == 1);
ends = find(D == -1) - 1;
end
##### 0 Kommentare-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Fifteen12 am 18 Sep. 2023
I've also put some work into this, but I'm unclear with what you want as your output, so I've got it to a good stopping place and I'll let you finish it with what you want to do. This script basically identifies the first and last index for every region in an inputted matrix using vectorized calls.
function [first, last] = findRegion(in)
valid = find((in + 1)');
neighbors = [0; valid];
regions = [valid; numel(in) + 1] - [-1; valid]; %Spaces between each valid index (first cell inflated for easier processing, last cell is handled later)
starting_indices = valid(regions > 1);
temp = logical([regions > 1; 0]);
ending_indices = valid(temp(2:end));
if (starting_indices(end)) == numel(in)
ending_indices(end+1) = numel(in);
end
%% Separate regions on different rows
rows = transpose(0:width(in):numel(in));
row_pairs = [floor((starting_indices - 1) ./ width(in)), floor((ending_indices - 1) ./ width(in))];
wrapped_regions = logical(row_pairs(:, 1) - row_pairs(:, 2));
first = sort([starting_indices; rows(wrapped_regions) + 1]);
last = sort([ending_indices; rows(wrapped_regions)]);
end
I wasn't sure if the largest region was the one with the most values or the one with the highest aggregated value, but I think you can probably calculate either from here.
##### 0 Kommentare-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Image Analyst am 19 Sep. 2023
If you have the Image Processing Toolbox, it's pretty easy:
M = [...]
1 2 3 -1 -1 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1
-1 2 3 -1 5 6 7 -1
-1 2 3 4 5 -1 -1 -1
-1 -1 -1 -1 -1 -1 -1 -1]
[rows, columns] = size(M); % Get dimensions.
% Process each row one at a time.
for row = 1 : rows
mask = M(row, :) ~= -1;
% Now, not sure how to define the "central value rounded up"
end
So I'm just not sure how you define central value. What would be the central value of a vector like this: [3,6,1,9]?
##### 2 KommentareKeine anzeigenKeine ausblenden
Image Analyst am 20 Sep. 2023
I know you were happy with your times of around 0.2 seconds, but with my solution I'm getting around 25 times faster than that on my 12 year old computer:
% Process each row one at a time.
tic
for row = 1 : rows
mask = M(row, :) ~= -1;
% Now, not sure how to define the "central value rounded up"
end
toc
Elapsed time is 0.007568 seconds.
However it does require the Image Processing Toolbox for the regionprops function, which is specifically made to detect and measure regions like that.
Bruno Luong am 20 Sep. 2023
Bearbeitet: Bruno Luong am 20 Sep. 2023
@Image Analyst 0.1-0.2 second is time with data of size 1000 x 10000 as bellow, not with OP toy example. Your code take 4 seconds which is more than 20 time slower than our codes (without tollbox).
M = repmat(1:1000, 10000, 1);
M(randi(end, 1000000, 1)) = -1;
tic
for row = 1 : size(M,1)
mask = M(row, :) ~= -1;
% Now, not sure how to define the "central value rounded up"
end
toc
Elapsed time is 4.032186 seconds.

Melden Sie sich an, um zu kommentieren.

### Kategorien

Mehr zu Loops and Conditional Statements finden Sie in Help Center und File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by