Delete rows with bad data and surrounding rows

I would like to delete rows which contain ones, sinces ones indicate bad data (inclusion criterion 1). Moreover, I would like to remove rows that are surrounded by those rows with bad information. The aim is to only include rows if they are present in sets of minimally 3 good (all zeros) rows (inclusion criterion 2). I created a matrix B to explain my question:
B = [0 1 0 0 1 0 1;
0 0 0 0 0 0 0;
0 1 0 0 1 0 1;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 1 0 1 1 0 1;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 0 1 0;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 1 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
1 0 0 0 1 0 0;
1 0 1 1 1 0 1];
In this 19x7 matrix row 1, 3, 4, 6, 8, 9, 10 ,1 4, 18 an 19 would be deleted by inclusion criterion 1. So far my loop (for multiple matrices like B) works. Regarding my inclusion criterion 2, row 2, 5, 7, and 8 must be deleted as well since they are not part of set of 3 or more rows with zeros. For inclusion criterion 2 I have to create an if structure in my existing loop.
% find or strcmp to look for the rows
% todelete = [] to eliminate these r
How can I delete rows that contain ones OR (||) are present in a set of less than 3 rows with all zeros?

2 Kommentare

madhan ravi
madhan ravi am 26 Jul. 2019
Would you mind showing how your expected result should look like??
L Maas
L Maas am 26 Jul. 2019
In this example of matrix B, the result is 6x7 matrix of only zeros. In my original data, the matrix contains colums with actual data as well. For this example, I want the result to be a new matrix of only original row 11,12,13 and 15, 16, 17. In my actual data the result will be something then like:
1 0 0 4.5677 0 0 0 6.346
2 0 0 3.78768 0 0 0 8.345
3 0 0 2.5 0 0 0 0.334
4 0 0 4.97678 0 0 0 3.572
5 0 0 1.2903 0 0 0 2.340
6 0 0 0.72372 0 0 0 34.02

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Jon
Jon am 26 Jul. 2019

0 Stimmen

Here's another approach
% script to clean data
B = [0 1 0 0 1 0 1;
0 0 0 0 0 0 0;
0 1 0 0 1 0 1;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 1 0 1 1 0 1;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 0 1 0;
0 1 0 0 0 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 1 0 0 1 1 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
0 0 0 0 0 0 0;
1 0 0 0 1 0 0;
1 0 1 1 1 0 1];
D = rand(size(B)); % data matrix to be cleaned
% assign parameters
minRun = 3; % minimum number of adjacent rows to be considered good data
% make vector with ones for the good rows (rows with only zeros)
iGood = ~any(B,2);
% now mark the locations where the beginning and end of each run of ones
% starts and ends
% use diff to create jumps at transitions, pad with 1's to ensure jump at start and end
isJump = [1; diff(iGood(:))~=0; 1];
% find location of jumps
jmpIdx = find(isJump);
% find run lengths of zeros, and ones
n = diff(jmpIdx);
% n has the lengths of runs of zeros, and runs of ones interleaved, but we
% need to find out whether it starts with the zeros, or starts with the
% ones
if iGood(1) == 1
% starts with ones
offset = 0;
else
% starts with zeros
offset = 1;
end
% in preparation for using repelem, build a vector with alternating
% values of zero and run lengths
run = zeros(size(n)); % initalize and preallocate
iStart = 1 + offset; % element where first run of ones starts
run(iStart:2:end) = n(iStart:2:end);
% assign the run lengths corresponding to each row
runLength = repelem(run,n);
% only keep rows in B that are members of sufficiently wide (run length) peaks
idxClean = 1:size(B,1);
idxClean = idxClean(runLength >= minRun);
Bclean = B(idxClean,:);
% also probably want to clean some other matrix based upon status of B
Dclean = D(idxClean,:)

4 Kommentare

L Maas
L Maas am 29 Jul. 2019
Bearbeitet: L Maas am 29 Jul. 2019
Thank you for this extensive explanation.
Unfortunately the function to create a vector for the good rows doesn't work when the original matrix contains columns with valid data as well.
% make vector with ones for the good rows (rows with only zeros)
iGood = ~any(A,2);
How can I make this work for matrix A?
A = [0 1 0 0 0 1 0 1;
0 0 0 4.76 3.45 0 0 0;
0 1 0 0 1 1 0 1;
0 1 0 0 1 0 1 0;
0 0 0 2.983 6.9234 0 0 0;
0 1 0 1 0 1 0 1;
0 0 0 5.73 2.394 0 0 0;
0 0 0 9.273 1.2903 0 0 0;
0 1 0 0 0 1 1 0;
0 1 0 0 0 0 1 0;
0 0 0 3.78 7.238 0 0 0;
0 0 0 3.123 5.203 0 0 0;
0 0 0 1.295 0.2673 0 0 0;
0 1 0 0 1 1 1 0;
0 0 0 2.493 2.76 0 0 0;
0 0 0 9.235 2.394 0 0 0;
0 0 0 237.1 4.567 0 0 0;
1 0 0 0 1 1 0 0;
1 0 1 0 1 1 0 1];
My result should look like
0 0 0 3,78 7,2380 0 0 0
0 0 0 3,123 5,203 0 0 0
0 0 0 1,295 0,2673 0 0 0
0 0 0 2,493 2,760 0 0
0 0 0 9,235 2,394 0 0 0
0 0 0 237,1 4,567 0 0 0
Jon
Jon am 30 Jul. 2019
Bearbeitet: Jon am 30 Jul. 2019
Just replace the line
% make vector with ones for the good rows (rows with only zeros)
iGood = ~any(B,2);
with
% make vector with ones for the good rows (rows with only zeros)
iGood = ~any(B==1,2);
Note, in the above, I stayed with my original variable B as the matrix that needs cleaning.
Of course you could replace all the B's with A's in my script if you want the matrix to be cleaned to be called A
By the way, I notice in your example you use "," where I use "." I know that in European, and other countries this is the convention, I'm not sure how MATLAB handles these type of regional differences, but you may want to watch out for that if it causes you any difficulties.
Guillaume
Guillaume am 30 Jul. 2019
I'm not sure how MATLAB handles these type of regional differences
More often than not: badly, unfortunately.
L Maas
L Maas am 1 Aug. 2019
Now I see that I used dots and commas interchangeably, that's a mistake. I should have used dots: 3.78 7.2380 etc.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (3)

Guillaume
Guillaume am 26 Jul. 2019
Bearbeitet: Guillaume am 26 Jul. 2019
First, the easiest and fastest way to implement criterion 1 is:
todelete = any(B, 2);
For criterion 2, since you just want to look on either side, you can just shift up or down the above vector:
todeleteall = todelete | [false; todelete(1:end-1)] | [todelete(2:end); false];
B(todeleteall, :) = []
Another way of implementing 2, particularly if you want a larger windows than one each side is with a convolution:
halfwindow = 1; %up or down
todeleteall = conv(todelete, ones(2*halfwindow+1, 1), 'same') > 0;
B(todeleteall, :) = []
edit: or as shown by Andrei, you could also use imdilate. There are many ways you could implement that criterion 2. movsum would be another one (which would let you have different before and after good rows).
edit2: As per the cyclist comment, the above is not quite right, see later comment for the actual solution.

11 Kommentare

+1
the cyclist
the cyclist am 26 Jul. 2019
The todeleteall algorithm isn't quite right. I believe it is insisting on a valid row above or below, but that is not required for the first or last row of a set.
Guillaume
Guillaume am 26 Jul. 2019
@the cyclist, I'm not sure what you mean. As far as I can tell, all the options I've proposed will only look at the rows below for the top row(s), and the rows above for the bottom row(s).
the cyclist
the cyclist am 26 Jul. 2019
Bearbeitet: the cyclist am 26 Jul. 2019
For example, your solution deletes rows 11 and 13, which are part of the valid set 11,12,13 (if I understand OP's directive properly).
As part of the check of row 11, it is checking for a valid row 10, but that is not necessary (since 11 is the first row of a set).
Guillaume
Guillaume am 26 Jul. 2019
Bearbeitet: Guillaume am 26 Jul. 2019
Oh indeed, you're right that doesn't quite work.
Ok, the easiest is the use the undocumented fact that strfind works on numeric vectors:
todelete = any(B, 2);
startrun = strfind(todelete', [0, 0, 0]); %need 3 consecutive zeros
tokeep = unique(startrun + [0; 1; 2]);
B = B(tokeep, :)
The downside of this method is that it uses undocumented features so may not work in a future version.
L Maas
L Maas am 29 Jul. 2019
Bearbeitet: L Maas am 29 Jul. 2019
I get the idea of this script and it works on the cereated matrix B, but it doesn't work if there are columns in the data with actual data with values as 4.675, 2.56 and 1.3435 (everything but 0 and 1). This is a result of:
todelete = any(B, 2)
so this part has to be changed and then it might work in total.
Guillaume
Guillaume am 29 Jul. 2019
It's very unclear what criterion 1 / non-good value is in this case. In your question, you said: "good (all zeros) rows". The any(B, 2) will treat non-zero as non-good. If the criterion is now something else, you need to say.
Guillaume
Guillaume am 29 Jul. 2019
Bearbeitet: Guillaume am 1 Aug. 2019
Since you've now shown a proper example in the comment to another answer (but still haven't explained exactly what is a good row or a bad row),
Two options:
  • A bad row is any row where there's a 1:
todelete = any(B == 1, 2);
  • A bad row is a any row made exclusively of 0s and 1s
todelete = all(ismember(B, [0 1]), 2);
rest of the code is unchanged
L Maas
L Maas am 1 Aug. 2019
I understand the confusion. In my data set bad rows are rows with ones in column 1,4,5,6, 9,10 and 11. Good rows have only zeros in these columns. Colum 2,3, 7 and 8 contain valid values.
I realised today that for my data processing it is not that easy, or not the aim apparently, to remove just rows with only zeros (in column 1,4,5,69,10 an 11). So I have to apply another approach. Thank you anyway to think along with me.
the above can easily be changed to apply to just certain columns. If the criteria is that good rows have 0s in column 1,4,5,6, 9,10 and 11, then
todelete = any(B(:, [1, 4, 5, 6, 9, 10, 11]), 2);
and then, as it got buried in all the comments, the simplest way to apply criterion 2 is:
startrun = strfind(todelete', [0, 0, 0]); %need 3 consecutive zeros
tokeep = unique(startrun + [0; 1; 2]);
B = B(tokeep, :)
L Maas
L Maas am 2 Aug. 2019
Thank you very much, this works!

Melden Sie sich an, um zu kommentieren.

Andrei Bobrov
Andrei Bobrov am 26 Jul. 2019
Bearbeitet: Andrei Bobrov am 26 Jul. 2019

0 Stimmen

ii - row indices with valid data (imdilate - function from the Image Processing Toolbox).
ii = find(~imdilate(any(B,2),[1;1;1]));
Other variant
(on L Maas's comment: " a new matrix of only original row 11,12,13 and 15, 16, 17 ")
lo = any(B,2) == 0;
ii_valid = unique(strfind(lo(:)',ones(1,3)) + (0:2)');
the cyclist
the cyclist am 26 Jul. 2019
Bearbeitet: the cyclist am 26 Jul. 2019

0 Stimmen

Here is one way.
Bm2 = [ones(2,N); B(1:end-2,:)];
Bm1 = [ones(1,N); B(1:end-1,:)];
Bp1 = [B(2:end,:); ones(1,N)];
Bp2 = [B(3:end,:); ones(2,N)];
v = not(any(B, 2));
vm2 = not(any(Bm2,2));
vm1 = not(any(Bm1,2));
vp1 = not(any(Bp1,2));
vp2 = not(any(Bp2,2));
valid = (vm2 & vm1 & v) | (vm1 & v & vp1) | (v & vp1 & vp2);
The output variable valid is a logical vector with "true" at each valid row. Use
find(valid)
to get the indices of the valid rows.

Kategorien

Mehr zu Loops and Conditional Statements finden Sie in Hilfe-Center und File Exchange

Tags

Gefragt:

am 26 Jul. 2019

Kommentiert:

am 2 Aug. 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by