How to read in multiple text files, each containing multiple lines/formats?
1 Ansicht (letzte 30 Tage)
Ältere Kommentare anzeigen
Hi
Thanks for reading and any support in advance. I am trying to read multiple text files in a folder for which I have the following code. The source of the data is this kaggle dataset - https://www.kaggle.com/kmader/pulmonary-chest-xray-abnormalities
files = dir(fullfile('archive.1/ChinaSet_AllFiles/','ChinaSet_AllFiles','ClinicalReadings','*.txt'));
N = length(files)
data = []
for i = 1:N
t = files(i).name;
formatspec = '%s %s%*[^\r\n]%*[\r\n]+%s';
file = fopen(fullfile(files(i).folder,t),'r');
A = textscan(file , formatspec, 'delimiter','\n');
data = [data; A];
fclose(file)
end
It loops through the files fine but the files themselves have some data inconsistencies such as the following:
Usual Files:
femal 32yrs
normal
Other files:
male 40yrs
PTB in the right upper field
I need three columns for each file such as - male, 40yrs, "PTB in the right upper field"
Can someone please support?
0 Kommentare
Antworten (2)
dpb
am 16 Mai 2021
Very difficult without example files to see the nuances, but the two records above I'd handle more like--
d=dir(fullfile('archive.1/ChinaSet_AllFiles/','ChinaSet_AllFiles','ClinicalReadings','*.txt'));
tData=[]; % empty table placeholder
for i = 1:numel(d) % iterate over dir struct
fid=fopen(fullfile(d(i).folder,d(i).name;),'r'); % open file in turn
data=textscan(fid,'%s,'delimiter','\n','whitespace',''); % read as cellstr() array by record
tmp=split(data(1)); % split the first record to sex, age fields
tData=[tData;table(tmp(1),tmp(2),data(2),'VariableNames',{'Gender','Age','Diagnosis'})]; % insert into table
fclose(fid)
end
The above assumes these are the only two record types and that they all follow the pattern of two fields on the first and one long record on second.
0 Kommentare
Mathieu NOE
am 16 Mai 2021
hello
I have to admit that I am not a super pro of textscan , so someone else will probably make a better code than me , but this is what I tried and tested as a workaround
files = dir(fullfile('archive.1/ChinaSet_AllFiles/','ChinaSet_AllFiles','ClinicalReadings','*.txt'));
N = length(files)
data = []
% for i = 1:N
% t = files(i).name;
% formatspec = '%s %s%*[^\r\n]%*[\r\n]+%s';
% file = fopen(fullfile(files(i).folder,t),'r');
% A = textscan(file , formatspec, 'delimiter',' ');
% data = [data; A];
% fclose(file)
% end
for i = 1:N
t = files(i).name;
rr = readlines(fullfile(files(i).folder,t));
temp = split(rr{1});
% remove empty cells
empty = cellfun('isempty',temp)
temp(empty) = [];
% finally...
A = [temp' rr{2}];
data = [data; A];
end
0 Kommentare
Siehe auch
Kategorien
Mehr zu Text Files finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!