# Converting unformatted text to formatted text

18 views (last 30 days)
Ibro Tutic on 24 Nov 2015
Edited: Ibro Tutic on 20 Apr 2019
I asked this question before and neglected some info, so I want to start fresh to avoid confusion.
I have a text file (attached) with data that is similar to data I am looking at. What I need to do is take, for example, RainflowCycleCounterHistogram[0][0]=0, and map this value, 0, to a matrix location equal to [0,0].
Then, if possible, take RainflowCycleMeanBreakpoints[0,1,2,etc] and put them on the left hand side of the matrix and RainflowCycleRangeBreakpoints[0,1,2,etc] across the top.
I am having some success using the below code, but it seems in-efficient compared to some of the other things I have seen and I get an error when I try to run it. It basically runs through the text file and picks out what I need to use. I have a structure that contains all of the arrays I need for the matrix, but it won't work because of the error I posted below. It looks like the error happens when I try accessing matrix(j).info, which doesn't make sense because when checking this array, they are just single numbers, which shouldn't cause a problem.
clear all;
close all
clc
projectdir = 'C:\Users\me\data.psr';
newdir = 'C:\Users\me\Desktop\Test1';
fid=fopen(projectdir,'r');
T=textscan(fid, '%s');
fclose(fid);
for i=8:107
a=T{1,1}{i,1};
b= a(30:48);
matrix(i).r = b(2);
matrix(i).c = b(5);
matrix(i).info = b(8:13);
end
A = zeros(9,9)
for j=8:107
A(matrix(j).r, matrix(j).c) = matrix(j).info;
end;
The error:
Assignment has more non-singleton rhs dimensions than non-singleton subscripts
Error in Untitled2 (line 23)
A(matrix(j).r, matrix(j).c) = matrix(j).info;
This answer by user Stephen Cobeldick might help, although it was created only to deal with the histogram. It gives an error when ran however.
% identify digits:
rgx = '[A-Z]+$(\d+)$$(\d+)$:*(\d+)';
C = regexp(str,rgx,'tokens');
% convert digits to numeric:
M = cellfun(@str2double,vertcat(C{:}));
M(:,1:2) = 1+M(:,1:2);
% convert to linear indices:
out = nan(max(M(:,1)),max(M(:,2)));
idx = sub2ind(size(out),M(:,1),M(:,2));
% allocate values:
out(idx) = M(:,3)
Error using cellfun
Input #2 expected to be a cell array, was double instead.
Error in Untitled3 (line 12)
M = cellfun(@str2double,vertcat(C{:}));

per isakson on 25 Nov 2015
Stephen, Thanks for making me aware that I'm wasting my time!
Ibro Tutic on 25 Nov 2015
Yes, I probably should have left the question, but I was under the impression that if I ask the same question again you would feel that your answers weren't good enough. Adding in more and more info that I missed will confuse the person answering the question and me, as I probably don't remember what exactly I had posted before. This was the simplest solution to a small problem, now what I will do is rewrite the original question and put it in there to solve what seems to be the biggest issue in the history of this forum.
I completely understand where you are coming from, but I am not sure that you understand what I am trying to accomplish by posting this new question. Sorry, I guess? I am trying to rectify my mistake and it seems that people are more worried about the fact that I deleted a question rather than trying to "help" with my other questions, judging from what isakson just commented. I am legitimately trying to learn how to do this and people are making MASSIVE deals out of problems that shouldn't be that important (yes, if I just deleted the entire question I would understand, but I clearly stated my intentions). Like I said, yea, I probably messed up deleting the question, but I'm not sure if arguing about that rather than actually helping with the question is the mature thing to do.
It's not like I am consistently deleting every question that I get answered to cover up a trail or something. It was my first time doing it and now I realize that I screwed up in doing so. I remain respectful in every aspect of my questions, giving credit to people who wrote certain code, etc.
Stephen Cobeldick on 25 Nov 2015
I hope that you get the help and information that you need, and have fun learning MATLAB! We do put a lot of effort in when people need it, so please come and ask more questions :)

per isakson on 24 Nov 2015
Edited: per isakson on 28 Nov 2015
I have assumed that the size of the resulting arrays are known
fid = fopen( 'c:\m\cssm\test4.txt' );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = rows{:};
str = 'RainflowCycleCounterHistogram'; % avoid magic number
len = length( str );
is_counter = strncmp( str, rows, len );
counter_rows = rows( is_counter );
%
str = 'RainflowCycleMeanBreakpoints';
len = length( str );
is_mean = strncmp( str, rows, len );
mean_rows = rows( is_mean );
%
str = 'RainflowCycleRangeBreakpoints';
len = length( str );
is_range = strncmp( str, rows, len );
range_rows = rows( is_range );
%
counter_matrix = nan( 10, 10 );
for jj = 1 : length( counter_rows )
%
cac = textscan( counter_rows{jj}, '%*s%d%d%f' ...
, 'Delimiter' , ' []:' ...
, 'MultipleDelimsAsOne', true );
%
counter_matrix( cac{1}+1, cac{2}+1 ) = cac{3}; % one based
end
mean_vector = nan( 1, 10 );
for jj = 1 : length( mean_rows )
%
cac = textscan( mean_rows{jj}, '%*s%d%f' ...
, 'Delimiter' , ' []:' ...
, 'MultipleDelimsAsOne', true );
%
mean_vector( 1, cac{1}+1 ) = cac{2}; % one based
end
range_vector = nan( 1, 10 );
for jj = 1 : length( range_rows )
%
cac = textscan( range_rows{jj}, '%*s%d%f' ...
, 'Delimiter' , ' []:' ...
, 'MultipleDelimsAsOne', true );
%
range_vector( 1, cac{1}+1 ) = cac{2}; % one based
end
&nbsp
or maybe better - no assumptions regarding sizes
fid = fopen( 'c:\m\cssm\test4.txt' );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = rows{:};
str = 'RainflowCycleCounterHistogram'; % avoid magic number
len = length( str );
is_counter = strncmp( str, rows, len );
counter_rows = rows( is_counter );
%
str = 'RainflowCycleMeanBreakpoints';
len = length( str );
is_mean = strncmp( str, rows, len );
mean_rows = rows( is_mean );
%
str = 'RainflowCycleRangeBreakpoints';
len = length( str );
is_range = strncmp( str, rows, len );
range_rows = rows( is_range );
%
CRS = permute( char( counter_rows ), [2,1] );
cac = textscan( CRS, '%*s%f%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
sz1 = min( num(:,1:2), [], 1 );
sz2 = max( num(:,1:2), [], 1 );
sz = sz2-sz1+[1,1];
ix_linear = sub2ind( sz, num(:,1)+1, num(:,2)+1 ); % one based
counter_matrix( ix_linear ) = num(:,3);
counter_matrix = reshape( counter_matrix, sz );
MRS = permute( char( mean_rows ), [2,1] );
cac = textscan( MRS, '%*s%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
mean_vector( num(:,1)+1 ) = num(:,2); % one based
RRS = permute( char( range_rows ), [2,1] );
cac = textscan( RRS, '%*s%f%f' ...
, 'Delimiter' , ' []:'...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
%
range_vector( num(:,1)+1 ) = num(:,2); % one based
hope they return identical results :-)
&nbsp
and another iteration
• A function is superior to a script. It doesn't mess with the base workspace. It's easier to debug and it's easier to call from a script or function.
• This function is readable. It's fairly straightforward to add new keywords and row formats.
• The switch case can be replaced by a feval construct. But why do that?
• The subfunctions, f1, f2 and f3, have large parts of their code in common. That asks for further refactoring.
• Allocating a separate sub-function to each type of row makes testing easier.
• If speed becomes a problem analyze the code with the profiler.
>> S = cssm( 'c:\m\cssm\text4.txt' )
S =
RainflowCycleCounterHistogram: [10x10 double]
RainflowCycleMeanBreakpoints: [-111 100 300 330 360 380 390 400 410 420]
RainflowCycleRangeBreakpoints: [0 35 70 100 135 170 200 230 260 300]
RainflowCycleReversalTolerance: 20
PowerCylinderTemperature: 0
PowerCylinderTemperatureHistogram: [1x12 double]
PowerCylinderTemperatureHistogramBreakpoints: [0 150 175 200 220 250 300 320 350 370 400]
>>
where
function S = cssm( filespec )
fid = fopen( filespec );
rows = textscan( fid, '%s', 'Delimiter', '\n' );
fclose( fid );
rows = strtrim( rows{:} );
type_list = {
... format keyword
'f1', 'RainflowCycleCounterHistogram'
'f2', 'RainflowCycleMeanBreakpoints'
'f2', 'RainflowCycleRangeBreakpoints'
'f3', 'RainflowCycleReversalTolerance'
'f3', 'PowerCylinderTemperature'
'f2', 'PowerCylinderTemperatureHistogram'
'f2', 'PowerCylinderTemperatureHistogramBreakpoints'
};
for jj = 1 : size( type_list, 1 )
switch type_list{jj,1}
case 'f1'
S.(type_list{jj,2}) = f1( type_list{jj,2}, rows );
case 'f2'
S.(type_list{jj,2}) = f2( type_list{jj,2}, rows );
case 'f3'
S.(type_list{jj,2}) = f3( type_list{jj,2}, rows );
otherwise
error( 'The format, "%s", is not yet implemented', type_list{jj,1} )
end
end
end
function matrix = f1( keyword, rows )
ism = is_member( keyword, rows );
cur_rows = rows( ism );
%
str = permute( char( cur_rows ), [2,1] );
cac = textscan( str, '%*s%f%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
sz1 = min( num(:,1:2), [], 1 );
sz2 = max( num(:,1:2), [], 1 );
sz = sz2-sz1+[1,1];
ix_linear = sub2ind( sz, num(:,1)+1, num(:,2)+1 ); % one based
matrix( ix_linear ) = num(:,3);
matrix = reshape( matrix, sz );
end
function matrix = f2( keyword, rows )
ism = is_member( keyword, rows );
cur_rows = rows( ism );
%
str = permute( char( cur_rows ), [2,1] );
cac = textscan( str, '%*s%f%f' ...
, 'Delimiter' , '[]: '...
, 'MultipleDelimsAsOne' , true ...
, 'CollectOutput' , true );
num = cac{1};
%
matrix( num(:,1)+1 ) = num(:,2); % one based
end
function matrix = f3( keyword, rows )
ism = is_member( keyword, rows );
cur_rows = rows( ism );
%
str = permute( char( cur_rows ), [2,1] );
cac = textscan( str, '%*s%f', 'Delimiter',':' );
matrix = cac{:};
end
function ism = is_member( keyword, rows )
% the keyword is followed by either ":" or "["
cac = regexp( rows, ['^',keyword,'(?=(:|\[))'], 'once' );
ism = not( cellfun( @isempty, cac ) );
end

Ibro Tutic on 25 Nov 2015
Sounds good, I'll see what I can figure out, thanks!
dpb on 25 Nov 2015
What is the desired output again? I'd approach it a little more generically but not sure where am headed as for what, precisely to do with the end result but I'll note that from your file one can do the following--
>> tok=cellfun(@(x) tokens(x,'[]:'),S,'uniformoutput',0); % find tokens each line
>> whos tok
Name Size Bytes Class Attributes
tok 52x1 13660 cell
>> tok{1} % sample what looks like
ans =
RainflowCycleCounterHistogram
0
0
1.0000000000
>> ntok=cellfun(@(x) size(x,1),tok); % number in each row
>> [min(ntok) max(ntok)] % range overall in file
ans =
2 4
>> for n=min(ntok):max(ntok) % build specific format string
fmt=['%s' repmat('[%d]',1,n-2) ':%f']
end
fmt =
%s:%f
fmt =
%s[%d]:%f
fmt =
%s[%d][%d]:%f
>> [u,iu]=unique(cellfun(@(x) x(1,:),tok,'uniform',0),'stable') % what's in file and where???
u =
'RainflowCycleCounterHistogram'
'RainflowCycleMeanBreakpoints'
'RainflowCycleRangeBreakpoints'
'RainflowCycleReversalTolerance'
'PowerCylinderTemperature'
'PowerCylinderTemperatureHistogram'
'PowerCylinderTemperatureHistogramBreakpoints'
iu =
1
8
18
28
29
30
42
>>
From the above pieces one can write a general parser for each possible data line format as long as they follow the form of
String[Index1][Index2]: Value
where the number of indices can be 0,1,2. The above actually will hand N-dimensional arrays; just that 2's the largest seen to date.
With the above it's simple enough to write a routine that loops over the elements in the U array , build the proper format string and select and parse the given lines without any specific testing for matching strings at all unless and until a user asks for only a given one or set at which time those can be returned from the general result.
But, you don't need to parse the individual lines at all; simply convert the fields within the token array for the ones of choice from the corollary tok array; ntok gives the info on how many elements there are corresponding to the fields.
function tok = tokens(s,d)
% Simple string parser returns tokens in input string s
%
% T=TOKENS(S) returns the tokens in the string S delimited
% by "white space". Any leading white space characters are ignored.
%
% TOKENS(S,D) returns tokens delimited by one of the
% characters in D. Any leading delimiter characters are ignored.
% DPBozarth (Rev 1 1998)
% Get initial token and set up for rest
if nargin==1
[tok,r] = strtok(s);
while ~isempty(r)
[t,r] = strtok(r);
tok = strvcat(tok,t);
end
else
[tok,r] = strtok(s,d);
while ~isempty(r)
[t,r] = strtok(r,d);
tok = strvcat(tok,t);
end
end
Also, of course, regexp can return tokens if one's got the patience to figure out the proper expression needed...
per isakson on 25 Nov 2015

dpb on 24 Nov 2015
>> fmt='%*s%f%f%f';
>> fid=fopen('test4.txt');
>> v(sub2ind(sz,c(:,1)+1,c(:,2)+1))=c(:,3)
v =
Columns 1 through 10
1 0 1 1000 0 0 0 1 0 0
Columns 11 through 20
0 0 0 1 0 0 0 0 0 0
>> fid=fclose(fid);