Filter löschen
Filter löschen

How to process a large binary file with set skipping patterns

2 Ansichten (letzte 30 Tage)
Ivy Chen
Ivy Chen am 1 Mär. 2018
Kommentiert: Ivy Chen am 1 Mär. 2018
We need to process very large binary files (500 GB) to plot. However the file reading and skipping known patterns seems to take really long. Currently, it takes about 10 minutes to process 800 MB including multiple 265 MB files.
Following are the code being used currently, and I will appreciate with any suggestion to improve performance on this.
BTW, after test out smaller files, we do eventually try to read in a large block (at least 1 GB) by increasing the NsperBatch/Kavg parameters.
Filepath = 'xxxxx';
NsperBatch = 2048;
header_size = 123;
Kavg=1200; % Average every K set of values, K=1200 in this case
pattern_to_skip = int16([1024 0 2240 -24500]);
%
%%Header Information
load Header.mat
%
%%Read RF.bin file under the specified path
filename = 'rf.bin';
%
%%Run the data file to identify block to skip and swap types
pattern_to_skip = typecast( swapbytes(pattern_to_skip), 'uint8'); % Convert data types as defined storing sequence - big endian
PL = length(pattern_to_skip); % Length for the skip pattern
fid = fopen(filename,'r');
bytes = reshape( fread(fid, inf, '*uint8'), 1, []); % Original data row vector
fclose(fid);
%
orig_num_bytes = length(bytes);
skiplocs = strfind(bytes, pattern_to_skip); % Find the skip pattern and delete bypes in data file
for idx = fliplr(skiplocs)
bytes(idx:idx+PL-1) = []; % Delete bytes from the original data row vector
end
%
postskip_num_bytes = length(bytes);
fprintf('%d groups of [1024 0 2240 -24500] were skipped\n', (orig_num_bytes - postskip_num_bytes) / PL ); % Number of pattern skipped
%
fileID=fopen('post_rf.bin','w'); % Save the skip-pattern file
fwrite(fileID, bytes);
fclose(fileID);
%
Determine RF data size based on defined batches (NsperBatch=2048 in this case)
filename = 'post_rf.bin';
fid2 = fopen(filename,'r');
magSpectrumMAT_Avg=[];
ix = 0;
s = dir(filename);
while ~feof(fid2)
bytes_in = ftell(fid2);
fprintf('Processed %d bytes out of %d; %.2f%%\n', bytes_in, s.bytes, bytes_in / s.bytes * 100);
magSpectrum=0;
for k=1:Kavg
data=fread(fid2,NsperBatch*2,'int16');
dataIQ = complex(data(1:2:end,:), data(2:2:end,:));
num_dataIQ = length(dataIQ);
dataIQ_GnOff = (Gain * dataIQ) + (Offset+Offset*1i);
dataIQ_GnOff = dataIQ_GnOff * db2mag(Reference_L);
%
target_num_dataIQ = NsperBatch;
if num_dataIQ ~= target_num_dataIQ
fprintf('Note: complex data is not a multiple of %d samples long, padding\n', NsperBatch);
dataIQ_GnOff(target_num_dataIQ) = 0;
end
%
mag_dataIQ_GnOff = abs(fftshift( fft( reshape(dataIQ_GnOff, NsperBatch, []) ) ) / sqrt(NsperBatch)).^2;
magSpectrum = magSpectrum + mag_dataIQ_GnOff;
end
%
ix = ix + 1;
magSpectrum_avg = magSpectrum/Kavg;
magSpectrumMAT_Avg = [magSpectrumMAT_Avg magSpectrum_avg];
end
%
magSpectrumMAT_dB=pow2db(magSpectrumMAT_Avg); % Convert magnitude to dB after taking Average
%
save('rf_magSpectrumMAT_dB.mat','magSpectrumMAT_dB')
%
fclose all;
  2 Kommentare
Walter Roberson
Walter Roberson am 1 Mär. 2018
Your code
for idx = fliplr(skiplocs)
bytes(idx:idx+PL-1) = []; % Delete bytes from the original data row vector
end
is less efficient than it needs to be. Instead use
bytes_to_delete = cell2mat( arrayfun(@(IDX) IDX:IDX+PL-1, fliplr(skiplocs), 'uniform', 0) );
bytes(bytes_to_delete) = [];
Ivy Chen
Ivy Chen am 1 Mär. 2018
Wow, this replacement greatly improve the performance to process a 265 MB files from 206 secs to 11 secs. Thanks so much!
Just wondering that do we need to do this in 2 steps? meaning first, search for skip patters and save the post-skip file, then read the new file for IQ data calculation.
Performance wise, is there a way have the search/skipping/IQ reading and calculating being done together. However, if we do, will it necessarily improve the performance? thanks again!!!

Melden Sie sich an, um zu kommentieren.

Antworten (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by