Filter löschen
Filter löschen

Is there a way to efficiently read a .csv file into a dataset in Matlab

1 Ansicht (letzte 30 Tage)
Sarutahiko
Sarutahiko am 11 Sep. 2013
Ok, so here is the deal.
I have a 2.5GB csv file. I'd like to have it as a dataset so that I can use some of the indexing functions (like grab a certain row provided a certain value) type functionality.
here are some sample lines:
rs180759811,1,83977,0.0078454,0.99052,0.512,'0000','1010',0.45,.,.,F,.,.,.,.,.,.,imputed, rs188652299,1,84156,0.0012772,0.99851,0.50381,'0000','1100',0.65,.,.,R,.,.,.,1,.,.,imputed, rs192830046,1,86282,0.00080435,0.99911,0.59506,'0000','1111',0,.,.,R,.,1,.,.,.,.,imputed, rs146027550,1,88429,0.018998,0.97847,0.53261,'0000','1001',0.2,.,.,R,.,.,.,1,.,.,imputed, rs187571096,1,114699,0.010444,0.98884,0.5583,'0000','1000',0.65,.,.,R,.,.,.,1,.,.,imputed, rs191891026,1,171529,0.011039,0.98724,0.51818,'0000','1001',0.2,.,.,R,.,.,.,1,.,.,imputed,
But, as I see it, there is not a good way to go from csv --> dataset.
Here are the options I've been considering:
fgetl --> regexp --> cell array --> cell2dataset
I know I can get that to work, but it can't be the most efficient way.
textscan--> textscan allows me to specify a bunch commas as the delimiter, which is useful, but i am not even sure if I can read 1 line at a time with text scan.
csvread --> will not work because most of the values are not numeric.
Is there another option that will turn a csv directly into an array or dataset without having to treat it as strings, regexp it, the whole 9 yards?
Thanks very much.

Antworten (1)

Walter Roberson
Walter Roberson am 11 Sep. 2013
You can read a line at a time with textscan(), by specifying a count of 1 right after the format. But why not read it all with textscan() and then cell2dataset() the result, possibly after a horzcat() ?
cellinput = textscan(fid, '%s%f%f%f%f%f%s%s%f%s%s%s%s%s%s%s%s%s%s%s', 'delimiter', ',');
cell2dataset( horzcat(cellinput{:}) )
the horzcat() would take it from being a cell row vector with each member being a cell column vector, into being a row-and-column cell array.
For lack of better instruction, each column after the last consistent numeric column has been read in as a separate string. If you know that a certain column there will always be useless ".", then switch the corresponding %s to %*s . But for the column that is either 1 or ".", do not switch that to %g as %g will not gracefully match a "." in that column.

Kategorien

Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by