how to read data from desired lines of a large data set?

George on 5 Oct 2012
Dear all, I want to read desired lines from a large data set(>50GB) which is not possible to load all the data by simply invoking textscan.
what I can think is:
fid = fopen('data.dat');
nline = 0; % the line index
wline = 1000: 10^7; % the wanted lines
i = 1; % index for wline;
while ~feof(fid)||nline<max(wline)
ldata = fgets(fid);
nline = nline+1;
if nline == wline(i)
datas(i) = ldata;
i= i+1;
as you see, this loop is really time consuming. my questions is: 1. is there any function to read it faster (on Unix system) 2. is it possible to use pointer, so that just read the desired line
thank you
dataset 10^9 lines and 4 columns
0 0 0 0.5
0 0.05 200.05 1 ...

José-Luis on 5 Oct 2012
Edited: José-Luis on 5 Oct 2012
That is one big chunk of data. I have several suggestions:
  • Preallocate: in your code your are growing datas at each iteration. Preallocate using, e.g.
datas = ones(numLines,5);
This might not be a viable option if you want to allocate for a 10^9 x 5 matrix.
  • Split your data in several chunks, that you can read when needed. Look at the split utility
  • Use a database.
If you want to read just one line, and know the exact position (in bytes from the beginning), you could always try fseek.

