Filter löschen
Filter löschen

Use terminal to speed up file removal

1 Ansicht (letzte 30 Tage)
Pete
Pete am 17 Okt. 2017
Beantwortet: Stephen23 am 17 Okt. 2017
Hi all, I've got large number of CSVs generated each time a system changes state. Basically, the CSVs start as a single row [1x3] array, and any data is added as a new row. I've written simple loop that checks for any "empty" CSVs (only containing the single row) and remove this file. This however takes many (>10) minutes to complete and I want to try the same in terminal. Code as shown:
CSV_Filenames_STRUCT = dir(sprintf('%s/*.csv',ResultDirectory));
CSV_Filenames_CELL = {CSV_Filenames_STRUCT.name};
StartingNumberOfFiles = size(CSV_Filenames_CELL,2);
for NthFile = 1:StartingNumberOfFiles
NumberOfPeaks = size(textread(sprintf('%s/%s',ResultDirectory,CSV_Filenames_CELL{1,NthFile}),'%s'),1) - 1; % Number of rows less one for the 'x,y,value'
if ~NumberOfPeaks % Essentially empty
delete(sprintf('%s/%s',ResultDirectory,CSV_Filenames_CELL{1,NthFile}));
end
end
I've not used terminal much, and wondering if it'd be faster for the above when there are many files to process, and how to code the check for the single line check So far, I've got something like:
for f in *.csv;
do
L=`wc -l "$f" | awk '{print $1}'`
if test $L -eq 1
then
mv $f ./MT;
fi
done
which isn't quite working (there's spaces in the filename as shown below), but I'm out of my depth here so calling for help on how to use the "system"/"unix" options through Matlab. I'm running OS-X and Kubuntu Linux. I should also mention that the filenames have spaces in them like: "Filter 0000001 Fwd,Alignment Black Screen - Ref_01 Input_19 (2017-10-17 @ 13.30.20.103).csv"
  3 Kommentare
Pete
Pete am 17 Okt. 2017
Just started a set with 2,000,000 files, but only expect about 10% of these to have genuine results (200k), so the rest just 'empty' CSVs (one row of (title) data). Looking at profiler, I think the Matlab functions called from textread are possibly taking time. I've removed sprintf's and replaced with concatenation strings i.e. [PathPart1 '/' PathPart2] etc. Sped up a bit, but still a long time for processing. Any other suggestions?
Jan
Jan am 17 Okt. 2017
You mean "shell", not "terminal".

Melden Sie sich an, um zu kommentieren.

Antworten (2)

Jan
Jan am 17 Okt. 2017
I'm not sure if I understand your question correctly: You want to delete all files, which have one column only - correct?
FULLFILE is smarter than creating file names by sprintf().
CSV_Filenames_STRUCT = dir(fullfile(ResultDirectory, '*.csv'));
CSV_Filenames_CELL = {CSV_Filenames_STRUCT.name};
StartingNumberOfFiles = numel(CSV_Filenames_CELL);
for NthFile = 1:StartingNumberOfFiles
File = fullfile(ResultDirectory, CSV_Filenames_CELL{NthFile});
fid = fopen(File, 'r');
if fid == -1, error('Cannot open file: %s', File); end
line1 = fgetl(fid);
line2 = fgetl(fid);
fclose(fid);
if ~ischar(line2)
delete(File);
end
end
Is this faster? It tries to import 2 lines only.

Stephen23
Stephen23 am 17 Okt. 2017
Remove the textread and replace it with something like this (pseudocode):
fid = fopen(...,'rt');
fgetl(fid); % read first row
if feof(fid) % check if end of file
delete(...)
end
"I've removed sprintf's and replaced with concatenation strings "
I would recommend using fullfile: it actually makes the intention clearer.

Kategorien

Mehr zu Characters and Strings finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by