How to write data to a binary file at a specific position?
Ältere Kommentare anzeigen
Hello,
Let us say that my data looks like this -
data = [1,1,1,1,1;...
2,2,2,2,2; ...
3,3,3,3,3];
I would like a write this data to a binary file such that it looks like - [1;2;3;1;2;3;1;2;3;1;2;3 ... and so on].
Now for a small file, I can easily do this as - fwrite(fp, data(:), 'int16'); However, for a very large data file (where data size is 100*1e10 or more), it becomes extraordinary slow. The raw data is stored as deparate files for each row, so I can read the data row by row. So, is it possible to write data to a binary file in a specific position?
Thank you for help!
6 Kommentare
NeuronDB
am 25 Mär. 2022
Jan
am 25 Mär. 2022
Writing a 100x1e10 array as UINT16 means 2 TB. Of course writing this takes time. But I'm impressed by your computer also, which is able to store these 2 TB in the RAM before. Is this really the case?
Writing or importing data row-wise is time consuming also, because Matlab stores values in columnwise order. But according to your question, you have decided for this structure. What does "write data to a binary file in a specific position" exactly mean now? Of coure this works with fseek and fwrite, as _ has mentiones already. But this is much slower than writing the data in contiguous blocks.
Why do you use dummy data, if you have create some test data before?
What is the purpose of this code:
output_data = zeros(nrows*rowsize,1);
for i = 1:nrows
this_row = data(i, :); % This is meant, isn't it?
output_data(i:nrows:end) = this_row;
end
It is an expensive version of:
output_data = data(:);
But you have written this line already. Therefore I do not understand, what the 2nd code should demonstrate. Simply omit the expensive loop.
Let's start with some test data:
rowsize = 1e7;
nrows = 10;
data = randi([0, 32767], nrows, rowsize, 'int16');
What do you want to do now? What is the relation of the shown code and the question about writing data at specific positions into a file?
By the way, there is no 'b' format anymore in fopen for over 20 years now. Simply use 'W'.
NeuronDB
am 25 Mär. 2022
Akzeptierte Antwort
Weitere Antworten (1)
% Some test data storing the rows in different files:
nRow = 10;
nCol = 1e6;
for k = 1:nRow
[fid, msg] = fopen(sprintf('file%02d.bin', k), 'W');
assert(fid > 0, msg);
data = randi([0, 32767], nCol, 1, 'int16');
fwrite(fid, data, 'int16');
fclose(fid);
end
% *** Version 1: insert data in chunks into the file:
tic
% Create the output file:
[ofid, msg] = fopen(sprintf('matrix1.bin'), 'W');
assert(ofid > 0, msg);
% Pre-allocate the output file (not really needed):
width = 2; % Bytes per element
skip = (nRow - 1) * width;
fwrite(ofid, 0, 'int16', (nRow * nCol - 1) * width);
% Loop over input files:
for k = 1:nRow
[ifid, msg] = fopen(sprintf('file%02d.bin', k), 'r');
assert(ifid > 0, msg);
data = fread(ifid, Inf, '*int16');
fclose(ifid);
% Insert in output file in chunks:
fseek(ofid, (k-1) * width, 'bof');
fwrite(ofid, data(1), 'int16');
fseek(ofid, k * width, 'bof');
fwrite(ofid, data(2:nCol), 'int16', skip);
end
fclose(ofid);
toc;
% *** Version 2: Join array in the memory:
tic
% Loop over input files:
data = zeros(nRow, nCol, 'int16');
for k = 1:nRow
[ifid, msg] = fopen(sprintf('file%02d.bin', k), 'r');
assert(ifid > 0, msg);
data(k, :) = fread(ifid, Inf, '*int16');
fclose(ifid);
end
% Write output file at once:
[ofid, msg] = fopen(sprintf('matrix2.bin'), 'W');
assert(ofid > 0, msg);
fwrite(ofid, data, 'int16');
fclose(ofid);
toc;
Timings on my i5, Matlab R2018b, SSD:
Elapsed time is 46.099363 seconds. % Insert on disk
Elapsed time is 0.060289 seconds. % Insert in memory
This means, that the joining in the RAM is much faster than writing the data with skipping.
This might be different, if you convert the imported data to doubles, which use 8 byte per element instead of 2 bytes for int16. Maybe the available RAM is exhausted and the computer stores the data in the much slower virtual memory.
1 Kommentar
NeuronDB
am 26 Mär. 2022
Kategorien
Mehr zu Low-Level File I/O finden Sie in Hilfe-Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!