Read file with non-uniform lines?

6 Ansichten (letzte 30 Tage)
bene1
bene1 am 25 Okt. 2020
Kommentiert: bene1 am 27 Okt. 2020
Hi. I'm a Matlab newbie. I would like to read in a file where the lines have different formats, as below.
% Coordinates
% Code ID X Y
C 101 0.001 0.001
C 102 1.002 0.002
C 103 1.003 1.003
C 104 0.004 1.004
% Distances
% Code ID From To Dist
D 201 101 103 1.417
D 202 102 104 1.414
If the first character is C, use...
A = textscan(fid,'%c %d %f %f')
If the first character is D, use...
A = textscan(fid,'%c %d %d %d %f')
After, I'd like to assign the data to structs (c.id, c.x, c.y, d.id, d.from, d.to, d.dist), but first I think I just need to get it scanned in. Is it possible to apply some logic to reading the file? Thank you.
  5 Kommentare
Walter Roberson
Walter Roberson am 26 Okt. 2020
'^\s*C.*$', 'dotexceptnewline', 'lineachors'
or
'(?<=(^|\n))\s*C[^\n]*'
with no additional options needed
bene1
bene1 am 26 Okt. 2020
Great, thanks again. Now have...
C =
4×1 cell array
{' C 101 0.001 0.001←'}
{' C 102 1.002 0.002←'}
{' C 103 1.003 1.003←'}
{' C 104 0.004 1.004←'}
With C as a 4x1, I believe my next step is to extract out the columns. My first thought was
A = textscan(C,'%c %d %f %f')
but I see I can't do that. Looking into cell2struct?

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Walter Roberson
Walter Roberson am 26 Okt. 2020
Named tokens, I said. Do not extract the lines ahead of time.
FileText = fileread(YourFileName);
Ctokens = regexp(FileText, '^\s*C\s+(?<ID>\d+)\s+(?<X>\S+)\s+(?<Y>\S+)', 'names', 'lineanchors');
%Ctokens will now be a struct array with field names ID, X, and Y, each of which are character vectors.
C.ID = str2double({Ctokens.ID});
C.X = str2double({Ctokens.X});
C.Y = str2double({Ctokens.Y});
Dtokens = regexp(FileText, '^\s*D\s+(?<ID>\d+)\s+(?<From>\d+)\s+(?<To>\d+)\s+(?<Dist>\S+)', 'names', 'lineanchors');
%Dtokens will now be a struct array with field names ID, From, To, Dist, each of which are character vectors.
D.ID = str2double({Dtokens.ID});
D.From = str2double({Dtokens.From});
D.To = str2double({Dtokens.To});
D.Dist = str2double({Dtokens.Dist});
Amount of processing work is pretty minimial. Pretty much all of the effort is in figuring out the proper regexp patterns to use (which can be pretty tricky when there are variant lines.)

Weitere Antworten (1)

per isakson
per isakson am 26 Okt. 2020
>> S = cssm( 'd:\m\cssm\cssm.txt' )
S =
1×2 struct array with fields:
header
colhead
Code
data
>> S(1)
ans =
struct with fields:
header: "Coordinates"
colhead: ["Code" "ID" "X" "Y"]
Code: [4×1 string]
data: [4×3 double]
>> S(2)
ans =
struct with fields:
header: "Distances"
colhead: ["Code" "ID" "From" "To" "Dist"]
Code: [2×1 string]
data: [2×4 double]
where
function sas = cssm( ffs )
chr = fileread( ffs );
str = string( chr );
str = replace( str, char([13,10]), newline ); % get rid of the carriage return
% split the string into blocks. Use the block header as delimiter.
[blk,del] = strsplit( str, '(?m)^\x20*%\x20\w+\x20*\n' ...
, 'DelimiterType','RegularExpression' );
blk(1) = []; % remove empty block before the first delimiter
len = numel( del );
sas(1,len) = struct( 'header',"", 'colhead',"", 'Code',"", 'data',nan );
for jj = 1 : len % loop over all blocks
sas(jj).header = regexp( del(jj), '\w+', 'match','once' ); % match the name
cac = textscan( blk(jj), "%[^\n]", 1 ); % read the first row
tmp = strsplit( string(cac{1}) ); % split the row into column headers
tmp(1) = []; % remove the comment character, "%"
sas(jj).colhead = tmp;
cac = textscan( blk(jj), ['%s',repmat('%f',1,numel(tmp)-1)] ...
, 'Headerlines',1, 'CollectOutput',true );
sas(jj).Code = string(cac{1});
sas(jj).data = cac{2};
end
end
and where cssm.txt contains the data given in of your question.

Kategorien

Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange

Produkte


Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by