How to convert text line into numbers

4 Ansichten (letzte 30 Tage)
Wisam
Wisam am 21 Sep. 2014
Kommentiert: Wisam am 22 Sep. 2014
I am trying to read this text and put it in a vector, some of the elements must be repeated according to the numbers before * symbol, for example the first five elements should have a value of 10 and so on:
5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57
perm_i=[];
fid=fopen(file_name_out);
textscan(fid, '%s', 1, 'delimiter', '\n', 'headerlines', row_permi_start-1);
for j=1:row_permi_end-row_permi_start
c=textscan(fid, '%s', 1, 'delimiter', '\n');
astring=cell2mat(c{1});
ind1=find(astring=='*');
ind_temp=[];
if ~isempty(ind1)
for k=1:length(ind1)
indspace=find(astring==' ');
indspace1=indspace(indspace<ind1(k));
display (indspace);
if isempty(indspace1)
indspace1=0;
else
indspace1=indspace1(end);
end
display (indspace1);
num_loc(k)=length(indspace1)+1;
indspace1=indspace1(end);
display (indspace1);
num_1(k)=str2double(astring(indspace1+1:ind1(k)-1))-1;
ind_temp=[ind_temp,indspace1+1:ind1(k)];
display (num_loc);
end
astring(ind_temp)=[];
end
acell=textscan(astring,'%f');
var_temp=acell{1,1};
if ~isempty(ind1)
var_temp_1=var_temp;
for k=1:length(ind1)
var_temp(num_loc(k)+num_1(k) :end+num_1(k))=var_temp(num_loc(k):end);
var_temp(num_loc(k)+1:num_loc(k)+num_1(k))=var_temp(num_loc(k));
display (var_temp);
num_loc=num_loc+num_1(k);
end
  2 Kommentare
John
John am 21 Sep. 2014
I have not tried the above solutions/suggestions, but this is a natural job for regular expressions. MATLAB, the most versatile numerical computing package, provides extensive regular expression (regex) functionality. It does not have the utility of Perl, but there are enough regex varieties in MATLAB to collapse those loops into a few lines of regex code.
To get you started on regex in MATLAB:
Some of the regex functions you will likely have to use to craft a concise solution: regexp, regexprep
You will have to do a bit of reading and practising to get the hang of it. To give you an idea of how regex can serve you in parsing and manipulating the string, consider these few lines of code which give you the starting indices of the tokens -whether they have a multiplier prepended or not- you would probably want to manipulate:
myString = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57'
regexQuery = '((\d+)\*)?\d+(\.\d+)?'
indices = regexp(myString, regexQuery)
% indices = 1 6 14 19 27 35 41 49 57 63 71
The elements of indices point to the starting indices of the tokens you would be interested in. To achieve the effect of repeating numbers prepended with multipliers, you would have to look into the more advanced features of 'regexprep'.
These, and not code that ordinarily parses string tokens, are more likely to give you graceful solutions that are maintainable and readable.
You may find MATLAB's string functions useful as well:
Wisam
Wisam am 22 Sep. 2014
I appreciate your support, thanks

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Guillaume
Guillaume am 21 Sep. 2014
Bearbeitet: Guillaume am 22 Sep. 2014
I've not looked at your code (which is badly formatted), but to convert your example into a vector of numbers I would do:
str = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
v = [];
for group = strsplit(str) %split string at spaces into groups
groupparts = strsplit(group{1}, '*'); %split group at * (if no *, no split)
if numel(groupparts) == 1
v = [v str2num(groupparts{1})];
else
v = [v repmat(str2num(groupparts{2}), 1, str2num(groupparts{1}))];
end
end
Or as I said in my comment to John's answer, if you want to use a regexprep one liner:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));

Weitere Antworten (1)

John
John am 21 Sep. 2014
Bearbeitet: John am 21 Sep. 2014
As mentioned before, regular expressions provide more intuitive solutions (once you get the hang of the basics). This short snippet below, which returns the answer as a numeric vector, seems to work:
input = '5*10 6*10.65 4.82 5*10.65 6*10.91 12.62 6*10.91 6*10.74 11.51 5*10.74 6*16.57';
regexQuery = '(?<pre>(\d+))?(\*)?(?<post>\d+(\.\d+)?)'
matches = regexp(input, regexQuery, 'names')
res = ''
for i = 1:size(matches, 2)
if (isempty(matches(i).pre))
matches(1).pre = 1;
end
res = [res repmat([' ' matches(i).post ' '], [1 str2num(matches(i).pre)])];
end
res = str2num(res)
It uses regexp once and the results of that in a simple loop that concatenates the nascent string. And I would consider this a crude solution (if it actually works :-) ) with a lot of superfluous code. My guess is that exploiting named captures and the command substitution functionality in regexprep could collapse all that into 2 or 3 commands.
  1 Kommentar
Guillaume
Guillaume am 22 Sep. 2014
Bearbeitet: Guillaume am 22 Sep. 2014
I would argue that regular expressions are overkill in this case, considering you only need two strsplit, one to break the string at every space and one to break those split at the '*'.
You could indeed do it with a single line regexprep, but this involve a dynamic regular expression replacement string which is not particularly cheap in term of computation time (and not particularly easy to comprehend. For the record, the one liner is:
v = str2num(regexprep(str, '([^ ]+)\*([^ ]+)', '${repmat([$2 '' ''], 1, str2double($1))}'));
edit: On the other hand the regexprep is much faster than my strsplit solution.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Data Type Conversion finden Sie in Help Center und File Exchange

Tags

Noch keine Tags eingegeben.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by