Extract numbers from mixed string

I have a file containing header lines like the following,
Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = 50
Operator Note: Air Temperature=20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)
For a given parameter such as MaxDistance or Wind Speed, I would like to extract its numerical value. This is tricky because sometimes there is an equal sign, space, or units, and sometimes there is not, because different operators enter their notes differently (lesson: next time enforce consistency).
How would I extract the following: All numerical characters (ignoring spaces and equal signs but keeping decimal points) that appear after the string representing the parameter name. Stop when a letter or punctuation mark is reached. In the case of 'MaxDistance', I would obtain 60. In the case of Wind Speed, I would obtain 16.375.

2 Kommentare

Albert Yam
Albert Yam am 19 Jul. 2012
Bearbeitet: John Kelly am 26 Feb. 2015
What have you tried?
Jianming She
Jianming She am 17 Jun. 2020
Bearbeitet: Jianming She am 18 Jun. 2020
This seems a more general way:
function numArray = extractNumFromStr(str)
str1 = regexprep(str,'[,;=]', ' ');
str2 = regexprep(regexprep(str1,'[^- 0-9.eE(,)/]',''), ' \D* ',' ');
str3 = regexprep(str2, {'\.\s','\E\s','\e\s','\s\E','\s\e'},' ');
numArray = str2num(str3);
Example:
a = 'alpha=-3.5,beta=1e-2. but gamma = -34.1'
numArray = extractNumFromStr(a)
numArray =
-3.5000 0.0100 -34.1000

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Jan
Jan am 19 Jul. 2012
Bearbeitet: Jan am 19 Jul. 2012

19 Stimmen

Import the file into a string at first, e.g. by fileread. Then you get something like this (if not, please explain all necessary details):
Str = ['Test setup: MaxDistance = 60 m, Rate = 1.000, ', ...
'Permitted Error = 50 Operator Note: Air Temperature=20 C, ', ...
'Wind Speed 16.375m/s, Altitude 5km (Cloudy)'];
Now omit all equal characters:
Str(strfind(Str, '=')) = [];
Finally you can get the values:
Key = 'MaxDistance';
Index = strfind(Str, Key);
Value = sscanf(Str(Index(1) + length(Key):end), '%g', 1);
"Index(1)" cares for multiple occurences of the key.

3 Kommentare

K E
K E am 19 Jul. 2012
Lovely, and no regexp required. Lucas, Jan extracts the characters after the Key then gets the first number. From the scanf documentation, the %g format scans in a floating point number.
Jan
Jan am 19 Jul. 2012
The removing of the = is clear, I think. Then STRFIND looks for the wanted string. Afterwards the first number behind this string is extracted by SSCANF. Here "behind" means the position, where the string is found plus the number of characters the string have.
Lorenzo
Lorenzo am 30 Okt. 2013
This works great! Just a quick question Jan: what if you want to find all the uccurrence of a numeric value between two strings? For instance, let's say you want the numeric values that can be found between MaxDistance and Altitude in the original example (i.e. 60, 1000, 50 ecc ecc...). How can you achieve that?
I tried this:
Key1 = 'MaxDistance'; Key2 = 'Altitude'; Index1 = strfind(file, Key1); Index2 = strfind(file, Key2); Value = sscanf(file(Index1:Index2), '%g',1);
but still I can get nothing but the first value.... Also, I dont know a-priori the number of numbers that can be encontured between the two strings...
Thanks!
Lorenzo

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (5)

Stephan Koehler
Stephan Koehler am 7 Jun. 2017

6 Stimmen

Here is a one-line answer str2num( regexprep( Str, {'\D*([\d\.]+\d)[^\d]*', '[^\d\.]*'}, {'$1 ', ' '} ) )

2 Kommentare

Alexandre THIBEAULT
Alexandre THIBEAULT am 27 Jan. 2021
Best answer
Marco A. Acevedo Z.
Marco A. Acevedo Z. am 2 Apr. 2021
hi, good answer but how to include the - sign (if present). Thanks.

Melden Sie sich an, um zu kommentieren.

Freddy
Freddy am 19 Jul. 2012

2 Stimmen

Maybe a little bit too late, but i like to present you also my ("regexp training"-) solution. :)
A = regexp(Str,'(?<Keyword>(?:\w+\s*\w+))\s*=?\s*(?<Value>\d+\.?\d*)','names');
s = struct();
for i = A,
s.(genvarname(i.('Keyword'))) = str2double(i.('Value'));
end

1 Kommentar

Albert Yam
Albert Yam am 19 Jul. 2012
Bearbeitet: Albert Yam am 19 Jul. 2012
That took a long time for me to understand what you are doing. That's cool though.
How does it skip over 'Operator Note:' ?
Edit: Never mind I get it. It doesn't have anything for ':'. The '(?:\w' has nothing to do with a ':' in the string, it is grouping the token for 'up to two words'.

Melden Sie sich an, um zu kommentieren.

Albert Yam
Albert Yam am 19 Jul. 2012

1 Stimme

This is how I went about it, all steps included even the errors.
teststr = 'Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = 50 Operator Note: Air Temperature=20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)';
regexp(teststr,[\d])
regexp(teststr,['\d'])
regexp(teststr,['\d'],'match')
regexp(teststr,['\d+'],'match')
regexp(teststr,['\d+.?'],'match')
regexp(teststr,['\d+\.?'],'match')
regexp(teststr,['\d+\.?\d?'],'match')
regexp(teststr,['\d+\.?\d+?'],'match')
regexp(teststr,['\d+\.?\d*?'],'match')
regexp(teststr,['\d+\.?\d?'],'match')
regexp(teststr,['\d+\.?\d*'],'match')

6 Kommentare

K E
K E am 19 Jul. 2012
This is very useful for showing how one would construct a regular expression piece by piece. They are so cryptic if you don't use them much.
Albert Yam
Albert Yam am 19 Jul. 2012
Learning is fun.
G
G am 7 Nov. 2013
Bearbeitet: G am 7 Nov. 2013
Be careful with the last solution :
'\d+\.?\d*'
with the case:
teststr = 'Test setup: MaxDistance = 60 m, Rate = 1.000, Permitted Error = .5 Operator Note: Air Temperature=-20 C, Wind Speed 16.375m/s, Altitude 5km (Cloudy)';
it doesn't work (negative number and '.xxx' number notation (like Permitted Error & Air Temperature in the sample)).
If someone has already done these cases ...
G
G am 7 Nov. 2013
solved!
regexp(teststr,'\d+\.?\d*|-\d+\.?\d*|\.?\d*','match')
G
G am 7 Nov. 2013
Bearbeitet: G am 13 Nov. 2013
Better:
regexp(teststr,'\d+\.?\d*|-\d+\.?\d*|\.?\d+|-\.?\d+','match')
or
regexp(teststr,'-?\d+\.?\d*|-?\d*\.?\d+','match')
remains the -.34e-004 case !
Angkur Shaikeea
Angkur Shaikeea am 21 Okt. 2021
Bearbeitet: Angkur Shaikeea am 21 Okt. 2021
i need to extract
0.00000 0.00000 0.00000
0.00000 1.00000 0.00000
1.00000 0.00000 0.00000
from a text file containing
.............................................
Nodal positions:
0.00000 0.00000 0.00000
0.00000 1.00000 0.00000
1.00000 0.00000 0.00000
Nodal positions:
0.00000 0.00000 0.00000
0.00000 1.00000 0.00000
1.00000 0.00000 0.00000
Nodal positions:
0.00000 0.00000 0.00000
0.00000 1.00000 0.00000
1.00000 0.00000 0.00000
any help using regexp?

Melden Sie sich an, um zu kommentieren.

Dahai Xue
Dahai Xue am 10 Mär. 2016
Bearbeitet: KSSV am 25 Jan. 2021

1 Stimme

C.J. Harris, I put your regexp into a function to extract all numbers using regexp. I have hard time to find an array operation that can use the 'a' and 'b' without the loop. Hopefully somebody has ideas. Of course it is not difficult to add more parameters or options to find "certain" numbers with preceding or following landmark strings.
function nums = regExtractNums(str)
[a,b] = regexp(str, '\d+(\.\d+)?');
nums = zeros(length(a),1);
for k = 1:length(a)
nums(k) = str2double(str(a(k):b(k)));
end
end
C.J. Harris
C.J. Harris am 19 Jul. 2012

0 Stimmen

In order to extract a certain value:
Str = ['Test setup: MaxDistance = 60 m, Rate = 1.000, ', ...
'Permitted Error = 50 Operator Note: Air Temperature=20 C, ', ...
'Wind Speed 16.375m/s, Altitude 5km (Cloudy)'];
matchWord = 'Air Temperature';
[a,b] = regexp(Str,'\d+(\.\d+)?');
strPos = find(a > strfind(Str,matchWord),1,'first');
nValue = str2double(Str(a(strPos):b(strPos)));

Kategorien

Mehr zu Characters and Strings finden Sie in Hilfe-Center und File Exchange

Produkte

Gefragt:

K E
am 19 Jul. 2012

Bearbeitet:

am 21 Okt. 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by