how to remove outliers in large data sets?
    5 Ansichten (letzte 30 Tage)
  
       Ältere Kommentare anzeigen
    
    MUKESH KUMAR
 am 7 Jan. 2022
  
    
    
    
    
    Kommentiert: Image Analyst
      
      
 am 17 Jan. 2022
            I am unable to open example code of outliers (openExample('matlab/RemoveOutliersInVectorExample') ) and  openExample('matlab/DetermineOutliersWithStandardDeviationExample') also.
I had large datasets of power load  for three years at 30min interval, I want to remove the outliers poitns which is affecting my forecasting error.
any help to remove the outliers in such datasets would be appreciated.
Thanks
Reference image is attached which shows the outliers datasets in upper side of image, reference to these points the error is also high (as lower side of image).
Thanks again
2 Kommentare
  Image Analyst
      
      
 am 7 Jan. 2022
				How large is the data set?  How many gigabytes?  Can you attach a smaller set (less than 5 MB) in a .zip file?
Akzeptierte Antwort
  Image Analyst
      
      
 am 8 Jan. 2022
        Try this:
data = readmatrix('Copy of data.xlsx');
x = data(:, 1);
y = data(:, 2);
% Plot just the first cycle.
last = round(70000/3) 
x = x(1:last);
y = y(1:last);
subplot(2, 1, 1);
plot(x, y, 'b-')
grid on;
title('Showing One Cycle Only')
% Smooth the data.
windowWidth = 2001;	% Some large odd number.
smoothY = movmean(y, windowWidth);
hold on;
plot(x, smoothY, 'r-', 'LineWidth', 3)
% Compute difference between actual and smoothed.
diffy = y - smoothY;
subplot(2, 1, 2);
plot(x, diffy, 'b-');
grid on;
% Detect outliers as having a MAD of more than 900
outlierIndexes = abs(diffy) > 900;
% Plot outliers as red dots over the original data.
subplot(2, 1, 1);
hold on
plot(x(outlierIndexes), y(outlierIndexes), 'r.', 'MarkerSize', 7);
% Now remove outliers from x and y
x(outlierIndexes) = [];
y(outlierIndexes) = [];

2 Kommentare
  Image Analyst
      
      
 am 17 Jan. 2022
				You must have a version so old that rmoutliers was not in it yet.  However you can do it manually.  Just smooth the curve and subtract it from your data and threshold like I did.  I didn't use rmoutliers.
Weitere Antworten (0)
Siehe auch
Kategorien
				Mehr zu Spreadsheets finden Sie in Help Center und File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

