How to fill zeros and NaNs with the average of the previous nonzero consecutive values (part 2)
Ältere Kommentare anzeigen
Hi, here's a challenge that is too hard for me and I need enlightment, a few weeks ago I posted this question:
"So, i have a column like this:
T = [0;0;1;2;3;4;NaN;4;3;0;0;0;NaN;4;2;3;0;2;0];
And everytime I have a NaN or a zero, I would like to transform them in the average of the previous nonzero consecutive values (averages in bold):
T = [0;0;1;2;3;4;2.5;4;3;3.5;3.5;3.5;3.5;4;2;3;3;2;2];
I cannot figure out how I can do this in an efficient way (without for loops, because the column has thousands of rows)."
This was the only answer I got which helped me a lot:
"Using this FEX download,
T = [0;0;1;2;3;4;NaN;4;3;0;0;0;NaN;4;2;3;0;2;0];
idx=find(T,1);
[stem,T]=deal(T(1:idx-1),T(idx:end));
G=groupTrue(~isnan(T) & T~=0);
[~,~,lengths]=groupLims(groupTrue(~G),1);
T(~G)=repelem( groupFcn(@mean,T,G) ,lengths);
T=[stem;T]
T'
"
However, I think sometime my data has too many NaNs or zeros and I get erros like this:
Error using splitapply
Group numbers must be a vector of positive integers, and cannot be a sparse vector.
Error in groupFcn (line 30)
[varargout{1:nargout}]=splitapply(func,varargin{:},G);
Error in PowerDependency (line 130)
T(~G)=repelem( groupFcn(@mean,T,G) ,lengths);
Making me think that it's risky to use these types of functions that I can't understand when I have errors (lol).
Anyways my help request is can anyone figure out another way? Doesn't have to be duper efficient and instant, I can handle a few seconds of run time, but not too much like verifying row by row :(
I would be extremely happy if someone could help, cheers!
14 Kommentare
Dyuman Joshi
am 3 Apr. 2023
The output doesn't match your description -
T = [0;0;1;2;3;4;NaN;4;3;0;0;0;NaN;4;2;3;0;2;0];
Tout = [0;0;1;2;3;4;2.5;4;3;3.5;3.5;3.5;3.5;4;2;3;3;2;2];
- Two non-zero consecutive numbers before the 1st NaN are 3 and 4 whose average is 3.5, you have stated 2.5
- Two non-zero consecutive numbers before the 2nd last and last zero are 2 and 3 whose average is 2.5, you have stated 3 and 2 respectively
Cris LaPierre
am 3 Apr. 2023
Bearbeitet: Cris LaPierre
am 3 Apr. 2023
It the solution needs to be unique to your data, it would be helpful is you could share your data. Save it to a mat file and attach it to your post using the paperclip icon.
"I think sometime my data has too many NaNs or zeros and I get erros like this..."
I'd venture it's not enough, rather than too many. If there were an empty result that would produce the message.
Don't guess, set a breakpoint and see what is actually going on when the error occurs; we can't replicate the problem as you've not supplied a test case nor are all the elements in the above code snippet defined...
What's the answer for the first two elements in your input T array that are zero and so by the definition need to be replaced but there's no finite, nonzero value preceding them?
"...I can handle a few seconds of run time, but not too much like verifying row by row ..."
I'd venture a loop would be just fine; the code above is just hiding all the looping it's doing internally to find the groups and then use splitapply which is a looping construct internally, as well.
Just locate the start/stop locations and walk through them will probably be at least as fast if not faster...
Peter Perkins
am 5 Apr. 2023
This should be achievable using a grouped varfun or rowfun on a table. The trick is to define the groups, which are "each run of NaNs/zeros and the corresponding run of non-NaNs/non-zeros that precedes it." Alls you'd then need to do is write a function that compute the mean of the non-NaN/non-zero values in a vector, and assigns that to the NaN/zero values.
Finding the groups is a useful exercise involving isnan, ==, diff, and cumsum. It might be difficult.
the code above is just hiding all the looping it's doing internally to find the groups and then use splitapply which is a looping construct internally, as well.
No, the internals of the code use no loops. However, I do agree it is recommendable to provide an example where the code fails (probably in the original thread where this was posted however - there's no reason for a new thread).
dpb
am 5 Apr. 2023
Of course there are loops inside...all the vectorized MATLAB code eventually devolves to a loop underneath to iterate over the content of the array. It's just whether it's visible at the user level or not...
Matt J
am 5 Apr. 2023
Well yes, but the issue for the OP is avoiding M-Coded loops, right?
Peter Perkins
am 5 Apr. 2023
I'm not sure which "code above" was being referred to, but there's a difference between loops in M over each row of an array, loops in M over groups of rows in an array (found using logical indexing), and loops way down deep in the compiled code. It's not always true that loops in M are a bottleneck, it all depends on what is in the loop body.
Something is apparently using splitapply, which is "loop in M over groups of rows in an array (found using logical indexing)". As does most everything that does grouped calculations, except for accumarray, the father of them all.
dpb
am 5 Apr. 2023
The point of my comment to the OP was to not obsess over using a for..end loop; while not mentioning it explicitly, with current implementation of the compiler, M loops are really not the bottleneck they once were when the mantra was generated.
If it's easy to vectorize the code, sure do so. If it starts getting very complicated to build the paraphenalia around the problem to apply one of the vectorized solutions, then just go with the "dead-ahead" solution and see...it may very well be fast enough there's no point in spending more time trying to use the other constructs.
That's basically what Peter above amplifies on; my suggestion was that if build that set of groups by locating the start/end of the sections, one can iterate over those pretty quickly I'd expect.
Matt J
am 5 Apr. 2023
I revise what I said before. groupTrue() and groupLims() use no M-coded for-loops. groupFcn() does invoke splitApply() and whatever loops that contains.
@Peter Perkins wrote "Finding the groups is a useful exercise involving isnan, ==, diff, and cumsum. It might be difficult."
I'm buried at the moment and didn't see any data to try it on, anyways, but for this I'd strongly recommend OP look into @jan Simon's <FEX submission RunLength>. I'll bet it'll give OP the starting point and length of each of the desired runs with no effort other than downloading the mex file and calling it. It's a masteful piece of work as one would expect from Jan; I can't think the number of times it's saved my p(roverbial)a(ppendage). :)
Margarida
am 6 Apr. 2023
dpb
am 6 Apr. 2023
My first foray into the fray in response to "I think sometime my data has too many NaNs or zeros and I get erros like this..." was to observe that
:I'd venture it's not enough, rather than too many. If there were an empty result that would produce the message."
"Just sayin..." <vbg>
Margarida
am 7 Apr. 2023
Antworten (1)
Just for fun: a varfun soln. Would look similar using grouptransform.
Step one is to define groups of elements by finding runs of non-NaN/non-zero followed by runs of NaN/zero.
x = [0;0;1;2;3;4;NaN;4;3;0;0;0;NaN;4;2;3;0;2;0];
x(x == 0) = NaN;
i = isnan(x);
starts = [false; diff(i) < 0];
group = cumsum(starts);
T = table(x,i,starts,group)
Step 2 is to replace NaNs with the group means.
T2 = varfun(@myFun,T,InputVariables="x",GroupIngVariable="group")
T.xFilled = T2.myFun_x
function x = myFun(x)
m = mean(x,"omitmissing");
x(isnan(x)) = m;
end
Kategorien
Mehr zu Data Type Conversion finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!