if it is greater than 75 percent missing data

Question

Fabian Moreno am 26 Okt. 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/626688-if-it-is-greater-than-75-percent-missing-data

Beantwortet: Peter Perkins am 19 Nov. 2020

Hello there

I have a matrix of (10200x7) from 01-01-2019 to 29-02-2020, which is iqual to the hourly temperature measured. The data is organizaed in the first 6 columns with date information (yy-mm-dd-hh-mm-ss) and then my temperature data (column 7). I already got the nanmean dayli values for a new matrix. But I would like to know how can I get a conditional, for example: if more than 75% of my data each 24 hours is missing, give me NaN for that day. If not, give me nanmean value.

I'm trying something like this, but I'm pretty sure I'm on the wrong way

%first I find the hourly values positions after 0 hs and before 23 hs
x = find(files_mat{1,1}(:,5)>= 0 {1,1}(:,5)<=23);
%%then, I'm trying to get the conditional to the tempearture column
 
if isnan (files_mat{1,1}(x,7))<= 0.75
      M_24h
 elseif isnan (files_mat{1,1}(x,7))>= 0.75
     M_24h

Thank you

2 Kommentare
Keine anzeigenKeine ausblenden

Rik am 26 Okt. 2020

I don't fully understand the shape of your data. But if you have an array, you can call isnan and get out a logical array. Using mean you can find the percentage of NaNs in each row, and using the comparator you will end up with a logical vector you can use to index into your output array and correct all non-NaNs.

I'm not posting this as an answer, as I can't really write a working example for you without understanding your data.

Fabian Moreno am 26 Okt. 2020

Hello Rik, what I posted is something that I'm trying to do. But I don't really know how to get the statment.

The only thing that I would like to get is the final value. I mean, I have hourly data from 0 hs to 23 hs each day, some days have NaN values between 0 to 23 hs. How can I do if i want to get mean dayli values with a statment that says, if the data have more than 6 hours missind data, give me NaN value. If not, give me the nanmean values.

For example the 75% of 24 hs is 18, and I have 16 valid values then the mean of that day would be (NaN). But if I have more than 18 valid values, bring me the mena values of that day.

hour:[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ]

temp: [10 11 10 12 10 10 13 14 15 15 16 16 17 NaN NaN NaN NaN NaN NaN NaN NaN 14 15 14]

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Peter Perkins am 19 Nov. 2020

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/626688-if-it-is-greater-than-75-percent-missing-data#answer_549293

In MATLAB Online öffnen

I think this would be much simpler if you created a timetable by combining the first 6 vars in the table. Actually, I have no idea if you even have a table, but you should. Use readtable to read in your data.

Given a timetable, this is a one-liner: use retime to aggregate to daily means, except use your own custom function that returns NaN if there's more than 18 missing hourly values. Something like this:

ttDaily = retime(ttHourly,'daily',myFun)
function y = myFun(x)
if sum(isnan(x)) > .75*length(x)
    y = NaN;
else
    y = mean(x,'omitnan');
end

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 2

Adam Danz am 26 Okt. 2020

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/626688-if-it-is-greater-than-75-percent-missing-data#answer_525123

Bearbeitet: Adam Danz am 27 Okt. 2020

In MATLAB Online öffnen

Here's an approach you can apply to your data,

% Create a table of year-month-day-hour-min-second-rain data with some of the hour data missing (NaN)
dateTimeVals = datetime(1999,1,1) + hours(1:5000)'; 
rain = rand(size(dateTimeVals)); 
T = array2table([datevec(dateTimeVals), rain], 'VariableNames',{'Y','M','D','H','Mn','S','rain'}); 
% Replace some Hour data with missing vals
rng('default')
T.H(randi(height(T),1,3000)) = NaN; 
% Show top of table
head(T) 
ans = 8x7 table
     Y      M    D     H     Mn    S      rain   
    ____    _    _    ___    __    _    _________

    1999    1    1      1    0     0     0.086662
    1999    1    1    NaN    0     0      0.17704
    1999    1    1    NaN    0     0    0.0087032
    1999    1    1    NaN    0     0     0.095159
    1999    1    1    NaN    0     0      0.17214
    1999    1    1    NaN    0     0      0.47676
    1999    1    1    NaN    0     0      0.43654
    1999    1    1    NaN    0     0      0.14804
% Add column to identify valid hour-data
T.dateValid = ~isnan(T.H); 
head(T) 
ans = 8x8 table
     Y      M    D     H     Mn    S      rain       dateValid
    ____    _    _    ___    __    _    _________    _________

    1999    1    1      1    0     0     0.086662      true   
    1999    1    1    NaN    0     0      0.17704      false  
    1999    1    1    NaN    0     0    0.0087032      false  
    1999    1    1    NaN    0     0     0.095159      false  
    1999    1    1    NaN    0     0      0.17214      false  
    1999    1    1    NaN    0     0      0.47676      false  
    1999    1    1    NaN    0     0      0.43654      false  
    1999    1    1    NaN    0     0      0.14804      false  
% Compute the percentage of valid hour-data per day
% This is achieved by applying mean() to the binary 
% 'dateValid' column for each day.
Tgs = groupsummary(T,{'Y','M','D'},'mean', 'dateValid');
head(Tgs)
ans = 8x5 table
     Y      M    D    GroupCount    mean_dateValid
    ____    _    _    __________    ______________

    1999    1    1        23           0.47826    
    1999    1    2        24             0.625    
    1999    1    3        24           0.58333    
    1999    1    4        24           0.54167    
    1999    1    5        24           0.58333    
    1999    1    6        24           0.66667    
    1999    1    7        24           0.45833    
    1999    1    8        24               0.5    
% This shows that for the first day of the first month of 1999, 
% 47.8% of hour-data are non-nan. Let's confirm that manually. 
daySelection = T.Y==1999 & T.M==1 & T.D==1;  
mean(T.dateValid(daySelection),'omitnan') * 100
ans = 47.8261
% To identify which days have >75% hour-data, 
Tgs(Tgs.mean_dateValid > 0.75, 1:3)
ans = 5x3 table
     Y      M    D 
    ____    _    __

    1999    1     9
    1999    3     5
    1999    4    28
    1999    5    24
    1999    7    21

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

if it is greater than 75 percent missing data

2 Kommentare
Keine anzeigenKeine ausblenden

Akzeptierte Antwort

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (1)

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

if it is greater than 75 percent missing data

2 Kommentare Keine anzeigenKeine ausblenden

Akzeptierte Antwort

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (1)

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

2 Kommentare
Keine anzeigenKeine ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden