How to manage NaN values and calculate mean under conditions

Hi guys! I need your help on that.
In the attached file I have daily values of Temperature. What I want to calculate for every daily value is the Factor
Factor(i) = (Ti+Ti-1+Ti-2)/3
The problem starts when trying to "tidy up" NaN values:
If one of Ti,Ti-1,Ti-2 is NaN, then
Factor(i) = (Ti+Ti-1)/2 [assuming that Ti-2=NaN].
If two of Ti,Ti-1,Ti-2 are NaN, then
Factor(i) = Ti [assuming that Ti-2=NaN=Ti-1].
If all of Ti,Ti-1,Ti-2 are NaN, then
Factor(i) = 'NaN'
In the (most hopeful) case that none of the three are NaN, then
Factor(i) = (Ti+Ti-1+Ti-2)/3
Here is what I have done so far, but it doesnt work as expected
Daily_T = Imported_data.Tmean;
Daily_T = array2table(Daily_T);
[col] = height(Daily_T);
Factors = zeros(4018,2);
for i = 1:col
if (isnan(Daily_T{i,1}))
Factors(i,1) = (1/2)* (Daily_T{i-1,1}+Daily_T{i-2,1});
elseif (isnan(Daily_T{i-1,1}))
Factors(i,1) = (1/2)* (Daily_T{i,1}+Daily_T{i-2,1});
elseif (isnan(Daily_T{i-2,1}))
Factors(i,1) = (1/2)* (Daily_T{i,1}+Daily_T{i-1,1});
elseif (isnan(Daily_T{i,1})) && (isnan(Daily_T{i-1,1}))
Factors(i,1) = Daily_T{i-2,1}
elseif (isnan(Daily_T{i,1})) && (isnan(Daily_T{i-2,1}))
Factors(i,1) = Daily_T{i-1,1}
elseif (isnan(Daily_T{i-1,1})) && (isnan(Daily_T{i-2,1}))
Factors(i,1) = Daily_T{i,1}
elseif (isnan(Daily_T{i,1})) && (isnan(Daily_T{i-1,1})) && (isnan(Daily_T{i-2,1}))
Factors(i,1) = 'NaN';
else
Factors(i,1) = (1/3)* (Daily_T{i,1}+Daily_T{i-1,1}+Daily_T{i-2,1});
end
end
However, Factors are not built as it should... Can anyone point out where is the flaw of my code please?
PS. I'm on Matlab 2019a

2 Kommentare

Read about nanmean. Also there are options to give omitnan while calculating means. Read the documentation.
How can I use nanmean in (let's say)
Factors(i,1) = (1/2)* (Daily_T{i-1,1}+Daily_T{i-2,1}) ?
When I tried earlier, I got error.

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Simon Chan
Simon Chan am 21 Jul. 2021
Bearbeitet: Simon Chan am 21 Jul. 2021
Try the following:
summation = sum for each group of data (Each group has 3 data)
notnandata = count the number of data which are not NaN. So if the entire group of data contains only NaN, it will output zero. And if this value is zero, set it to NaN.
Noticed that Factor(1) calculate the mean value for rawdata.Tmean(1:3). If you don't like this pattern, you may need to adjust the indexing yourself.
rawdata=readtable('Daily data.xlsx');
idx = 1:size(rawdata.Tmean,1)-2;
summation = arrayfun(@(x) sum(rawdata.Tmean(x:x+2),'omitnan'),idx,'UniformOutput',false);
notnandata = cell2mat(arrayfun(@(x) sum(~isnan(rawdata.Tmean(x:x+2))),idx,'UniformOutput',false));
notnandata(notnandata==0)=NaN;
Factor = cell2mat(summation)./notnandata

3 Kommentare

Simon thanks for your contribution!
  1. Shouldn't Factor have size 31*1 (as movmean has) instead of 29*1?
  2. Can your script be somehow modified in order to calculate Factor_2 (i) = (Ti-3+T-4-1+...+Ti-32)/30 ?
I tried idx = 1:size(Dailydata1.Tmean,1)-32;
summation = arrayfun(@(x) sum(Dailydata1.Tmean(x-32:x-3),'omitnan'),idx,'UniformOutput',false);
notnandata = cell2mat(arrayfun(@(x) sum(~isnan(Dailydata1.Tmean(x-32:x-3))),idx,'UniformOutput',false));
notnandata(notnandata==0)=NaN;
Factor = cell2mat(summation)./notnandata;
but I got Array indices must be positive integers or logical values.
PS. See attached file, which has enough rows to work.
Yes, the size for 'Factor' in your previous example is 29*1 only, because the size was deducted by 2 in the code ( idx = 1:size(rawdata.Tmean,1)-2)
So for your new data, it can be modified a little bit as follows. Note that the code just calculate the entire data in the column and you need to define which date is Ti.
If Ti is 2 Feb 2006, Ti-3 to Ti-32 would be 30 Jan 2006 to 1 Jan 2006 if I understand correctly and this is the first data in 'Factor30'.
Of course, using movmean suggested by dpb is much simpler and powerful.
rawdata=readtable('Daily data (1).xlsx');
num_days = 30; % Modify a little bit and you can assign number of days here
idx = 1:size(rawdata.Tmean,1)-num_days+1;
summation = arrayfun(@(x) sum(rawdata.Tmean(x:x+num_days-1),'omitnan'),idx,'UniformOutput',false);
notnandata = cell2mat(arrayfun(@(x) sum(~isnan(rawdata.Tmean(x:x+num_days-1))),idx,'UniformOutput',false));
notnandata(notnandata==0)=NaN;
Factor30 = cell2mat(summation)./notnandata
The modification for Factor30 works! Thanks!
But what is the difference between your approach and
idx = 33:size(Dailydata1.Tmean,1);
summation = arrayfun(@(x) sum(Dailydata1.Tmean(x-32:x-3),'omitnan'),idx,'UniformOutput',false);
notnandata = cell2mat(arrayfun(@(x) sum(~isnan(Dailydata1.Tmean(x-32:x-3))),idx,'UniformOutput',false));
notnandata(notnandata==0)=NaN;
Factor30 = cell2mat(summation)./notnandata;
that I tried myself? Besides the length, which in my case is 3986*1 and in yours 3989*1 .
In addition, I would use movmean as dpb suggested but I don't know how to make it work for Factor30.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (2)

dpb
dpb am 21 Jul. 2021
Bearbeitet: dpb am 21 Jul. 2021
M=movmean(Daily,[0 2],'omitnan');
or, for the specific file
>> tDaily=readtable('Daily data.xlsx');
>> tDaily.MTmean=movmean(tDaily.Tmean,[0,2],'omitnan');
>> head(tDaily)
ans =
8×3 table
Daily_Date Tmean MTmean
___________ _____ ______
01-Jan-2006 9.33 11.68
02-Jan-2006 11.90 13.04
03-Jan-2006 13.80 12.76
04-Jan-2006 13.43 11.53
05-Jan-2006 11.05 9.92
06-Jan-2006 10.12 8.39
07-Jan-2006 8.59 7.52
08-Jan-2006 6.45 6.24
>>

4 Kommentare

dpb thnak you for your answer!
I have two questions, in case you can help:
  1. What is the difference between M=movmean(Daily,[0 2],'omitnan'); and MovAv = movmean(Daily, 3, 1, 'omitnan'); that I tried myself? The output is slightly different.
  2. Can movmean be somehow used in order to calculate Factor_2 (i) = (Ti-3+T-4-1+...+Ti-32)/30 ?
dpb
dpb am 22 Jul. 2021
Bearbeitet: dpb am 22 Jul. 2021
  1. 3,1 is a 3-pt centered MA over dimension 1 (rows or by column) of the input data while [0,2] is the 3-pt MA 0 pts before and 2 after the data point -- what you said you wanted, not the first.
  2. I can't interpret what the second is supposed to be, but "movmean(A,[kb kf]) computes the mean with a window of length kb+kf+1 that includes the element in the current position, kb elements backward, and kf elements forward."(*)
(*) However, if the written subscripting expression is intended to be over elements [-32 -3] from each point, there's nothing to prevent you from augmenting the vector by three extra elements and operating over [29 0] with movmean and then discarding the first three of the result.
Read the documentation for movmean, it explains explicitly how each combination of inputs is used to select which elements for averaging.
Regarding question 2:
[29 0] means that moving average starts from current day and extends 29 days before that, correct?
In Factor_2 (i) = (Ti-3+T-4-1+...+Ti-32)/30 I want to start 3 days before current day (Ti-3) and extend up until 32 days before current day (Ti-32).
Could M=movmean(Daily,[0 32],'omitnan'); and then discarding the first three averages do the trick?
Is this way of thinking efficient?
dpb
dpb am 22 Jul. 2021
Bearbeitet: dpb am 22 Jul. 2021
  1. "movmean(A,[kb kf]) computes the mean with a window of length kb+kf+1 that includes the element in the current position, kb elements backward, and kf elements forward."
  2. From 1. above, ergo [0 32] would be from 0 to 32 past point i, not before.(*)
Again, "read the documentation" combined with experimenting with a sample dataset short enough to be able to watch the results; simply using 1:10 so can easily verify what the results should be/how are calculated would be an ideal debugging tool. You don't need all 32 to test what the various combinations are how how to manipulate the series to get what you're shooting for.
(*) Of course, you could fliplr() the series, then do the averaging and fliplr() back, but why not just put the point offset in in the correct order to begin with? The offset would be needed to use movmean with both elements negative; TMW didn't think of that possibility and won't accept anything <0 as the second argument. There's no real reason it couldn't; just that they didn't think of it -- all that would be required is that the start index be less than the ending one.

Melden Sie sich an, um zu kommentieren.

Another possibility that uses and preserves a timetable:
>> tt = readtimetable("Daily data.xlsx");
>> head(tt)
ans =
8×1 timetable
Daily_Date Tmean
___________ ______
01-Jan-2006 9.3282
02-Jan-2006 11.901
03-Jan-2006 13.802
04-Jan-2006 13.427
05-Jan-2006 11.052
06-Jan-2006 10.124
07-Jan-2006 8.5933
08-Jan-2006 6.4544
>> ttSm = smoothdata(tt,'movmean',days([0,2]),'omitnan');
>> head(ttSm)
ans =
8×1 timetable
Daily_Date Tmean
___________ ______
01-Jan-2006 11.677
02-Jan-2006 13.043
03-Jan-2006 12.76
04-Jan-2006 11.534
05-Jan-2006 9.9229
06-Jan-2006 8.3905
07-Jan-2006 7.5239
08-Jan-2006 6.2444

Kategorien

Mehr zu Data Preprocessing finden Sie in Hilfe-Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by