Textscan to read CSV File (Mixed Format)

12 Ansichten (letzte 30 Tage)
abhay  kaul
abhay kaul am 12 Dez. 2014
Kommentiert: per isakson am 26 Apr. 2015
I am trying to read multiple large CSV files (200,000 X 200 each) into MATLAB 2013a 64bit which contains numbers and strings. The order of numbers and strings is not fixed between files. I am using "Textscan" function. If I use the "%f" format then "textscan" stops execution as soon as it encounters the string so instead I ended up using "%s". But, using "str2double" to convert to numbers takes a long time. "str2double" also converts any non-numeric entry in the data to NaNs and I like using it but the overhead is too much.
I am wondering if anyone else has found a work around for this. I would ideally prefer using "textscan" since the code queries user to specify the line at which column names are in the CSV file and also some lines are commented in the CSV file and textscan is ideal for this file type. But any ideas are appreciated.
Thanks
  1 Kommentar
per isakson
per isakson am 26 Apr. 2015
Did you ever try to read these files with importdata?

Melden Sie sich an, um zu kommentieren.

Antworten (2)

Image Analyst
Image Analyst am 12 Dez. 2014
Apparently you need to inspect the line of text first to determine if it's a comment or not. So use fgetl() to read a string from the file. Then call strtrim() to get rid of any leading whitespace. Then check if the first character in non-numeric, a % symbol, or whatever else indecates a comment in your file. Then use sscanf() or textscan() to extract the numbers into a numerical array.
  1 Kommentar
abhay  kaul
abhay kaul am 12 Dez. 2014
Bearbeitet: abhay kaul am 12 Dez. 2014
Just a clarification on my question. The issue is not to identify if the line that has a comment or not. Textscan does that for me. The data I have has some columns that are numeric whereas some columns have all strings. But the order of columns which are numeric or string is not fixed among multiple files. I should have provided that information in my original questions. Thanks.

Melden Sie sich an, um zu kommentieren.


Star Strider
Star Strider am 12 Dez. 2014
Please post one of your files or a representative sample of one that demonstrates the problem.
As an interim possibility, see if this approach works:
fidi = fopen('filename.csv');
d1 = textscan(fidi, '%f %f', 'Delimiter','\n', 'HeaderLines',11, 'CollectOutput',1);
fseek(fidi,0,0);
d2 = textscan(fidi, '%f %f', 'Delimiter','\n', 'HeaderLines',11, 'CollectOutput',1);
You may have to put that in a loop and subscript ‘d1’ and ‘d2’ and others appropriately. (This is archive code that worked for the file it was intended to read.) You will have to adapt it to your file structure, including an initial 'HeaderLines' option if necessary.
The code reads starting with the initial string and the numerical data following until it gets to the next string (here assumed to have the same structure) where it stops. The fseek function automatically re-starts textscan reading the file at the correct position at the second textscan call. It then reads that string and the subsequent data until it gets to the end.
  2 Kommentare
abhay  kaul
abhay kaul am 12 Dez. 2014
Here is an example file.
File 1
# File 1
# This line should be ignored
ParaA, ParaB, ParaC, ParaD
ab, 1,2,3
bc, 4,5,6
%%%%%%%%%%%%%%%%%
File 2
# File 2
ParaM, ParaN, ParaO, ParaP, ParaQ
# this line should be ignored
1,2,ab,4,12:40:50
3,4,cd,5,12:40:51
Please see that I am able to read all these file types properly with Textscan if I specify data type to be string. But the overhead of str2double is too high. So, I was wondering if there is some way to read all as %f and have MATLAB store non-numeric columns as NaN.
Thanks
Star Strider
Star Strider am 12 Dez. 2014
You would likely need to read them in with different format strings for each file. The first one might be:
'%s%f%f%f'
the second:
'%f%f%s%f%s'
reading the time in as a string, since it probably is. I see no way around that otherwise. You could parse the inputs to store only the numeric data and ignore the strings after you read each file.

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by