Fastest way to import single column?

I am attempting to import just a single column (a textual date field) from a large file with many columns and many rows. When I use textscan to import only a single column, it still takes a very long time to import my column (file is approx 700 MB). This leads me to believe that it's reading the entire file and bringing back the column that I am specifying. Can someone offer up a faster solution to import a single column from a large text file? The text file can have any number of rows and any number of columns.
Currently I am doing the following -
%Import Date field only to check latest date in file
frmt = '%*s %s %*[^\n]';
dates = textscan(fid,frmt,'delimiter',',','Headerlines',1);
fclose(fid);
dates = dates{1};
Thanks a lot! Brian

1 Kommentar

José-Luis
José-Luis am 13 Jan. 2014
There's no going around reading most of the file for these kinds of problems unless you do some kind of preprocessing or generate only the data you need.

Melden Sie sich an, um zu kommentieren.

Antworten (2)

David Sanchez
David Sanchez am 13 Jan. 2014

0 Stimmen

From matlab's documentation:
Using a text editor, create a file scan1.dat that contains data in the following form:
09/12/2005 Level1 12.34 45 1.23e10 inf Nan Yes 5.1+3i
10/12/2005 Level2 23.54 60 9e19 -inf 0.001 No 2.2-.5i
11/12/2005 Level3 34.90 12 2e5 10 100 No 3.1+.1i
Read the first column of the file into a cell array, skipping the rest of the line:
fid = fopen('scan1.dat');
dates = textscan(fid, '%s %*[^\n]');
fclose(fid);

2 Kommentare

Brian
Brian am 13 Jan. 2014
David, thanks for the reply. If you look at my code you can see that I am already doing exactly what you suggested. I am wondering if there is a faster method than the one I'm currently using.
David Sanchez
David Sanchez am 13 Jan. 2014
Reading from a file usually is a time consuming task. Reading from such a big file ( 700 MB ) will take a long time no matter how you try to do it. The method you are using might be the best solution for the task.

Melden Sie sich an, um zu kommentieren.

Kelly Kearney
Kelly Kearney am 13 Jan. 2014

0 Stimmen

It's not a Matlab solution, but I have found that pre-processing in Perl (split columns and spit out a one-column intermediate file) and then using load to bring it into Matlab is faster than textscan.

4 Kommentare

Brian
Brian am 13 Jan. 2014
Thanks for the suggestion but I don't really have access to Pearl and work in a place where integrating new languages is easier said than done. Are there any Matlab based solutions that you know of that would beat textscan? Maybe a lower level code type of import?
Kelly Kearney
Kelly Kearney am 13 Jan. 2014
Bearbeitet: Kelly Kearney am 13 Jan. 2014
Matlab comes packaged with a version of Perl ( doc perl ), so it's very easy to integrate into Matlab code without any extra installations.
Here's an example script (save as parsedata.pl somewhere on path):
#! perl -w
#
# parsedata.pl file
use strict;
use warnings;
open (OUT, "> onecoltemp.txt");
while (<>) {
my @columns = split /,/, $_;
print OUT "$columns[1]\n";
}
close OUT;
Then call
perl('parsedata.pl', filename);
That should dump just the 2nd column (note that Perl is 0-based, hence $columns[1]) into a file named onecoltemp.txt, which you can then read into Matlab. How you read the column in will depend on the format of that data; if you can reduce it to something that can be read via load, that will be the quickest.
Brian
Brian am 14 Jan. 2014
That seems simple enough I may give it a try but am hesitant to put into production because our system is supported by multiple users. While I may take the time to understand it, others may not. Is there any way to do something similar completely within Matlab's coding architecture?
Kelly Kearney
Kelly Kearney am 14 Jan. 2014
I don't think so... I've always found textscan to be the fastest Matlab-only way of reading in an ascii file. As others have said, there's really no way around reading the whole file and then extracting what you need. The perl program does the exact same thing: read in a full line, break it up based on commas, keep just one column... perl just happens to be very good and very fast when it comes to churning through large amounts of text.

Melden Sie sich an, um zu kommentieren.

Produkte

Gefragt:

am 13 Jan. 2014

Kommentiert:

am 14 Jan. 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by