Read specific rows from a large .csv

9 Ansichten (letzte 30 Tage)
Lorenzo
Lorenzo am 6 Jul. 2016
Kommentiert: Steven Hunsinger am 14 Sep. 2022
Hi,
I try to find a solution, which computes fast, to handle a big .csv (35MB). Good part is I only a certain part of the file. Basically I would like to read only rows which start with a certain name.
Unfortunately the file is composed like this:
Varname_1 timestring(t=0) valueX valueY
Varname_2 timestring(t=0) valueX valueY
...
Varname_n timestring(t=0) valueX valueY
Varname_1 timestring(t=1) valueX valueY
Varname_2 timestring(t=1) valueX valueY
...
Varname_n timestring(t=1) valueX valueY
...
... and so on
My idea would be to read the .csv-file line by line check for Varname = Varname1 i.e. and write it to an cellarray (or 4 vectors) like this:
Varname_1 timestring(t=0) valueX valueY
Varname_1 timestring(t=1) valueX valueY
Varname_1 timestring(t=2) valueX valueY
...
Any idea for a smart code? Thank You! (add. notes: varname = string, time = string, value = number with , separated decimal)
------------------------------------ EDIT: example data
output would be i.e.
var2 10:10:10 16,1010138923
var2 10:10:20 89,1560542863
var2 10:10:30 69,557621819
var2 10:10:40 9,9246195517
  3 Kommentare
Lorenzo
Lorenzo am 6 Jul. 2016
Sorry! Means the decimal delimiter is not a point. Its a comma. Example: 12,34 instead of 12.34
dpb
dpb am 6 Jul. 2016
That, I think, you'll have to fixup outside Matlab; don't think it knows how to handle it?? If it's csv separated, that's a problem for certain.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Image Analyst
Image Analyst am 6 Jul. 2016
Use readtable() and then search column 1 for the filename pattern you want. Attach a small example with wanted and unwanted filenames if you can't figure it out.

Weitere Antworten (2)

dpb
dpb am 6 Jul. 2016
Bearbeitet: dpb am 6 Jul. 2016
Untested, but check that the pattern matching format string doesn't solve the problem directly...
vName='Varname_1'; % the variable name you're looking for
fmt=[vName '%s %f %f']; % match vName, string, two numerics
fid=fopen('yourbigfile.csv','r');
data=textscan(fid,fmt,'delimiter',',');
fid=fclose(fid);
As said I'm not positive, but I think there's at least a reasonable chance the pattern-matching will do what you're looking for. Worth a shot methinks...
Well, doggonit, magic doesn't happen, joy didn't ensue... :(
But, the original idea isn't difficult...
while ~feof(fid)
l=fgetl(fid);
if strfind(l,vName)
data{i}=textscan(l,fmt);
end
end
fid=fclose(fid);
worked for a sample file albeit I used space-delimited and '.' as the decimal indicator; I think that'll still be a problem.
I thought
while ~feof(fid)
try
data{i}=textscan(l,fmt);
catch
end
end
fid=fclose(fid);
would work around the issue but it didn't; textscan simply gave up and quit reading anything once if failed; it doesn't throw an error, it just throws up its hands silently. :(
  3 Kommentare
dpb
dpb am 6 Jul. 2016
I used textscan not csvread, IA???
He's also got comma as the decimal indicator and says he's got a .csv file in which case it's indeterminable--which comma is a delimiter and which is a decimal point?
Image Analyst
Image Analyst am 6 Jul. 2016
Oh, sorry - I didn't notice.

Melden Sie sich an, um zu kommentieren.


Lorenzo
Lorenzo am 6 Jul. 2016
Got it. readtable() works lightning fast. This is my approach:
1) overwrite , with . as decimal delimiter(not necessary but I need the values as numbers for postprocessing)
2) readtable
comma2point_overwrite('bigdata.csv')
T = readtable('bigdata.csv', 'Delimiter', ';');
T2 = T(find(strcmp('Durchflussmessung-H2-163bar_real', T{:,1})),:)
clearvars T;
where comma2point_overwrite() is:
function comma2point_overwrite( filespec )
% replaces all occurences of comma (",") with point (".") in a text-file.
% Note that the file is overwritten, which is the price for high speed.
file = memmapfile( filespec, 'writable', true );
comma = uint8(',');
point = uint8('.');
file.Data( transpose( file.Data==comma) ) = point;
end
Thanks for Your Help!!
  1 Kommentar
Steven Hunsinger
Steven Hunsinger am 14 Sep. 2022
Not so lightning fast if you get your company network involved. 67.5MB with a breakpoint after readtable. 10 minutes. This might be OK if I need all that data loaded into RAM, but seems excessive for reading the first line or so. Is there a better way?

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Workspace Variables and MAT Files finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by