Why am I getting an invalid file identifier error when using "parfor" but the same function works fine (albeit slow) with "for"?

13 views (last 30 days)
I am trying to run the code below using a file of binary data. I am trying to speed up the file reading using a parfor loop for reading the data points but keep receiving this error on the line with the parfor loop. This code seems to work (never been run to completion with a for loop because it takes too long) if I use just a for loop so it seems like the error is specific to the parfor function. Any help would be greatly appreciated!
Error I receive:
Invalid file identifier. Use fopen to generate a valid file identifier.
Code:
%% Creating File ID and finding File attributes
fid = fopen(full_file_path, 'r', 'l');
file_info = dir(full_file_path);
%% Reading and making the Header
obj.header.identifier = fread(fid, 6, 'uchar');
obj.header.return_mode = dec2hex(fread(fid, 1, 'uint8'));
obj.header.boresighting_quaternion = fread(fid, 4, 'double');
obj.header.scanner_linear_offset = fread(fid, 3, 'double');
obj.header.resepi_orientation_angles = fread(fid, 3, 'float32');
obj.header.reserved_1 = fread(fid, 1, 'uint64');
obj.header.azimuth_offset = fread(fid, 32, 'float32');
obj.header.elevation_offset = fread(fid, 32, 'float32');
obj.header.device_id = fread(fid, 1, 'uint32');
obj.header.extra_parameters = fread(fid, 32, 'float32');
obj.header.reserved_2 = fread(fid, 40, 'uchar');
%% Data Parsing
switch obj.header.return_mode
case '13'
data_packet_size = 668; %bytes
number_of_points = (file_info.bytes - 512) / data_packet_size;
microseconds_since_gps_epoch = zeros(number_of_points, 1);
theta_angle = zeros(number_of_points, 1);
phi_angle = zeros(number_of_points, 1);
range_1 = zeros(number_of_points, 1);
reflectivity_1 = zeros(number_of_points, 1);
tag_1 = zeros(number_of_points, 1);
range_2 = zeros(number_of_points, 1);
reflectivity_2 = zeros(number_of_points, 1);
tag_2 = zeros(number_of_points, 1);
range_3 = zeros(number_of_points, 1);
reflectivity_3 = zeros(number_of_points, 1);
tag_3 = zeros(number_of_points, 1);
parfor i = 1:number_of_points %Line I receive error on
if i == any(1:30:1000000000)
microseconds_since_gps_epoch(i) = fread(fid, 1, 'uint64');
end
theta_angle(i) = fread(fid, 1, 'uint16');
phi_angle(i) = fread(fid, 1, 'uint16');
range_1(i) = fread(fid, 1, 'uint32');
reflectivity_1(i) = fread(fid, 1, 'uint8');
tag_1(i) = fread(fid, 1, 'uint8');
range_2(i) = fread(fid, 1, 'uint32');
reflectivity_2(i) = fread(fid, 1, 'uint8');
tag_2(i) = fread(fid, 1, 'uint8');
range_3(i) = fread(fid, 1, 'uint32');
reflectivity_3(i) = fread(fid, 1, 'uint8');
tag_3(i) = fread(fid, 1, 'uint8');
end
obj.microseconds_since_gps_epoch = microseconds_since_gps_epoch;
obj.theta_angle = theta_angle;
obj.phi_angle = phi_angle;
obj.range_1 = range_1;
obj.reflectivity_1 = reflectivity_1;
obj.tag_1 = tag_1;
obj.range_2 = range_2;
obj.reflectivity_2 = reflectivity_2;
obj.tag_2 = tag_2;
obj.range_3 = range_3;
obj.reflectivity_3 = reflectivity_3;
obj.tag_3 = tag_3;
end

Answers (1)

Joseph Cheng
Joseph Cheng on 17 Jun 2021
Edited: Joseph Cheng on 17 Jun 2021
From the looks of it you've opened a file then trying to use parfor to grab items out of the opened file. However thats where things gets problematic as in the parfor loop there are too many "hands" trying to grab things from a single file possibly at the same time. Especially with how you've written the loop you've not also garunteed things to be in parsed correctly even if the parfor worked.
This is because in a single iteration of the for loop you were reading the file sequentially and if parallalized you're letting each processor grab at the file at the same time. You're better off reading in the whole file then parsing things out to the variables, or checking if you're data fits what is described here https://www.mathworks.com/matlabcentral/answers/276010-reading-in-large-binary-file-with-multiple-data-types-uint8-double-etc and use memmapfile().
  2 Comments
Walter Roberson
Walter Roberson on 10 Jan 2022
filename = fullfile(tempdir(),'parfor_test.txt');
fclose(fopen(filename, 'w')); %empty file
spmd
fids = fopen(filename,'a');
end
fid = [fids{:}];
parfor i = 1:50
FID = fid(getCurrentTask().ID);
fprintf(FID,'iter #%d FID %d\n', i, FID);
end
parfevalOnAll(@() close('all'), 0);
In my tests, the fid came out the same on all workers -- but one should not assume that will always be the case.
Notice that the file must be opened in append mode, and most be opened on each worker individually.
Each individual fprintf() is guaranteed to be written out "atomically" (at least up to 8192 bytes), not mixed up with the output of any other process.
If you were to open a file for reading inside a worker, then each fopen() has its own independent position. reading information in one process does not "consume" it for the other processes.
If you wanted to have each different worker process a line (or block) of information from the same file, then I would recommend that you have a single worker that reads a line (or block) and then uses parfeval() to queue processing of the content.
If you have a binary file, then each worker can fseek() to a different location.
Caution: the "sweet spot" for performance for spinning hard drives is usually two processes per hard drive, and only one hard drive per controller (with slower drives and faster controllers, having more per controller can be okay.)
If you are looking for high performance and you are not using a "server", then my understanding is that current optimum performance for spinning media is with using a direct-attached Thundebolt 4 connector to a RAID controller that is striping between at least two drives, and using fast drives. But a high quality SSD drive can do significally better (provided you have it attached to a good quality controller... which can start to involve Thunderbolt 4 based enclosures.)

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by