Searching through files for missing data
Ältere Kommentare anzeigen
Hi,
I have a set of 8000 files in the format of
YYYYMMDDHHMMSS
year/month/day/hour/minute/second
The files should increase by 5 minutes each time and I need to write a function that would check the file names are named in a logical way. And if any files are missing it can identify this for me.
The function strcmp and join have been reccomended to me.
Does anyone know how to do this?
Akzeptierte Antwort
Weitere Antworten (1)
Adam Danz
am 4 Feb. 2021
1 Stimme
>Does anyone know how to do this?
Lots of people know how to do this and we're here to help but few people will devote a portion of their day to do it for you.
Let's start by figuring out where you're stuck. There are just a few basic steps in your process and you can find lots of information in this forum, on the web, and in the documentation for each step.
- Get a list of files. See dir()
- Read in the file. There are lots of ways to read files depending on the filetype and content (review).
- Are your time stamps in datetime format? If not convert them to datetime.
- If all you want to do is check whether a file is missing, you just need to store the following 3 data points for each file as 2 separate variables. This will be done in your loop: The first and last datetime value can be stored in an nx2 matrix for n files and the filename stored as an nx1 string array.
- Once all files are read and the 3 data points are stored for each file, you can sort the datetime values in case the files are read out of order and then compare the first datetime of file n with the last datetime from file n-1. If that difference is more than 5 minutes, you know you're missing a file and you can use the filename array to help identify which file is missing.
If you get stuck on any step leave a comment below and show us where you're at with the code and what the problem is.
9 Kommentare
Stephen23
am 4 Feb. 2021
I suspect that the difference is not exactly five minutes... most likely some tolerance will be required in the comparison.
drb17135
am 4 Feb. 2021
Oh, I see. So, the datetime values are stored in the filename. That's even easier. There's no need to read in the files.
New list of tasks:
- Get a list of files. See dir()
- Loop through each file name and isolate the datetime portion of the file name. This can be done several ways: by parsing the final segment between the last underscore and the file extension, or by using regular expressions.
- Convert the isolated time stamp string to a datetime value using t = datetime(DateStrings,'InputFormat',infmt) and store those datetime values in an nx1 array.
- After all file names are analyzed, differentiate the datetime vector and determine if any durations are more than 5 minutes (also consider Stephen's advice about tolerance).
I suspect step 2 will be most difficult for beginners. Give it a shot and circle back if you get stuck.
drb17135
am 4 Feb. 2021
Example:
The first line below produces a vector of datetime values (ignore how that's done for now).
The second line uses diff() to differentiate the datetime vector via (n+1)-n and converts the difference to minuts using minutes().
dt = datetime('now','Format','HH:mm') + cumsum(minutes(randi(5,1,10)))
minutes(diff(dt))
drb17135
am 11 Feb. 2021
drb17135
am 11 Feb. 2021
Adam Danz
am 11 Feb. 2021
Looks like you're making progress.
1. In this line, you could add the file extension if it's the same for all files. That should list all of the files you need, assuming they are all in the same folder.
files = dir('C:\Users\drb17135\Documents\August_Radar\*.*')
files = dir('C:\Users\drb17135\Documents\August_Radar\*.hdf') % change to this
2. This is where things go wrong. "tmp" should be the "files" variable above. You don't need this line. Replace tmp with "files".
tmp=dir;
3. Instead of these 3 lines,
files = dir('C:\Users\drb17135\Documents\August_Radar\*.*')
file = myfile(5:5); %isolates the file in terms of 'yyyyMMddHHmmSS'
datetime(file,'InputFormat','yyyyMMddHHmmSS'); %gives name in format of '10-Sep-2020 04:00:00' - for example
Use these two, based on example: 'T_PAAH72_C_EIDB_19991231134501.hdf' (YYYYMMDDHHmmSS)
% >>files(idx).name
% ans =
% 'T_PAAH72_C_EIDB_19991231134501.hdf'
[~, timestamp] = regexp(files(idx).name, '([0-9]*).hdf','match','once','tokens');
% Which returns
% timestamp =
% {'19991231134501'}
timestampDT = datetime(timestamp{1},'InputFormat','yyyyMMddHHmmss')
% Which returns
% timestampDT =
% datetime
% 31-Dec-1999 13:45:01
4. Instead of assuming you have 8000 files, use the actual number of files identified to define the loop.
for idx = [1:8000] % Not this
for idx = 1:numel(files) % use this, where "files" is defined in my step #1 above.
5. Lastly, you need to store the datetime stamps in the loop so the loop should be structured like this (using variable names above).
timestampDT = nat(numel(files),1); % preallocate the loop variable
for idx = 1:numel(files)
% < PUT YOUR OTHER STUFF HERE >
% store all datetimes from the file names
timestampDT(idx) = datetime(timestamp{1},'InputFormat','yyyyMMddHHmmss');
end
Then you can differentiat
dt = diff(timestampDT)
Kategorien
Mehr zu Calendar finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!