MATLAB Answers

how to read data from desired lines of a large data set?

2 views (last 30 days)
George on 5 Oct 2012
Dear all, I want to read desired lines from a large data set(>50GB) which is not possible to load all the data by simply invoking textscan.
what I can think is:
fid = fopen('data.dat');
nline = 0; % the line index
wline = 1000: 10^7; % the wanted lines
i = 1; % index for wline;
while ~feof(fid)||nline<max(wline)
ldata = fgets(fid);
nline = nline+1;
if nline == wline(i)
datas(i) = ldata;
i= i+1;
as you see, this loop is really time consuming. my questions is: 1. is there any function to read it faster (on Unix system) 2. is it possible to use pointer, so that just read the desired line
thank you
dataset 10^9 lines and 4 columns
0 0 0 0.5
0 0.05 200.05 1 ...

Answers (1)

José-Luis on 5 Oct 2012
Edited: José-Luis on 5 Oct 2012
That is one big chunk of data. I have several suggestions:
  • Preallocate: in your code your are growing datas at each iteration. Preallocate using, e.g.
datas = ones(numLines,5);
This might not be a viable option if you want to allocate for a 10^9 x 5 matrix.
  • Split your data in several chunks, that you can read when needed. Look at the split utility
  • Use a database.
If you want to read just one line, and know the exact position (in bytes from the beginning), you could always try fseek.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by