How to convert text line into numbers

2 views (last 30 days)
Wisam
Wisam on 21 Sep 2014
Commented: Wisam on 22 Sep 2014
I am trying to read this text and put it in a vector, some of the elements must be repeated according to the numbers before * symbol, for example the first five elements should have a value of 10 and so on:
5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57
perm_i=[];
fid=fopen(file_name_out);
textscan(fid, '%s', 1, 'delimiter', '\n', 'headerlines', row_permi_start-1);
for j=1:row_permi_end-row_permi_start
c=textscan(fid, '%s', 1, 'delimiter', '\n');
astring=cell2mat(c{1});
ind1=find(astring=='*');
ind_temp=[];
if ~isempty(ind1)
for k=1:length(ind1)
indspace=find(astring==' ');
indspace1=indspace(indspace<ind1(k));
display (indspace);
if isempty(indspace1)
indspace1=0;
else
indspace1=indspace1(end);
end
display (indspace1);
num_loc(k)=length(indspace1)+1;
indspace1=indspace1(end);
display (indspace1);
num_1(k)=str2double(astring(indspace1+1:ind1(k)-1))-1;
ind_temp=[ind_temp,indspace1+1:ind1(k)];
display (num_loc);
end
astring(ind_temp)=[];
end
acell=textscan(astring,'%f');
var_temp=acell{1,1};
if ~isempty(ind1)
var_temp_1=var_temp;
for k=1:length(ind1)
var_temp(num_loc(k)+num_1(k) :end+num_1(k))=var_temp(num_loc(k):end);
var_temp(num_loc(k)+1:num_loc(k)+num_1(k))=var_temp(num_loc(k));
display (var_temp);
num_loc=num_loc+num_1(k);
end
  2 Comments
Wisam
Wisam on 22 Sep 2014
I appreciate your support, thanks

Sign in to comment.

Accepted Answer

Guillaume
Guillaume on 21 Sep 2014
Edited: Guillaume on 22 Sep 2014
I've not looked at your code (which is badly formatted), but to convert your example into a vector of numbers I would do:
str = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
v = [];
for group = strsplit(str) %split string at spaces into groups
groupparts = strsplit(group{1}, '*'); %split group at * (if no *, no split)
if numel(groupparts) == 1
v = [v str2num(groupparts{1})];
else
v = [v repmat(str2num(groupparts{2}), 1, str2num(groupparts{1}))];
end
end
Or as I said in my comment to John's answer, if you want to use a regexprep one liner:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));

More Answers (1)

John
John on 21 Sep 2014
Edited: John on 21 Sep 2014
As mentioned before, regular expressions provide more intuitive solutions (once you get the hang of the basics). This short snippet below, which returns the answer as a numeric vector, seems to work:
input = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
regexQuery = '(?<pre>(\d+))?(\*)?(?<post>\d+(\.\d+)?)'
matches = regexp(input, regexQuery, 'names')
res = ''
for i = 1:size(matches, 2)
if (isempty(matches(i).pre))
matches(1).pre = 1;
end
res = [res repmat([' ' matches(i).post ' '], [1 str2num(matches(i).pre)])];
end
res = str2num(res)
It uses regexp once and the results of that in a simple loop that concatenates the nascent string. And I would consider this a crude solution (if it actually works :-) ) with a lot of superfluous code. My guess is that exploiting named captures and the command substitution functionality in regexprep could collapse all that into 2 or 3 commands.
  1 Comment
Guillaume
Guillaume on 22 Sep 2014
I would argue that regular expressions are overkill in this case, considering you only need two strsplit, one to break the string at every space and one to break those split at the '*'.
You could indeed do it with a single line regexprep, but this involve a dynamic regular expression replacement string which is not particularly cheap in term of computation time (and not particularly easy to comprehend. For the record, the one liner is:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));
edit: On the other hand the regexprep is much faster than my strsplit solution.

Sign in to comment.

Tags

No tags entered yet.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by