Removing specific characters from string in nested cells
8 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Bob Thompson
am 13 Jun. 2018
Kommentiert: Stephen23
am 30 Dez. 2022
I have a series of strings which are contained within a nested cell array (because regexp loves to nest cells), and I would like to remove any non numeric or white space characters from them so that I can convert them to doubles, namely astrick.
I'm looking for the least painful way of removing any of these special characters from all strings. I do not have a sample file to attach, sorry, but I have dictated the shape of a sample array below.
X == 1x1 cell
X{1} == 1x1 cell (because regexp can't help itself apparently)
X{1}{1} = {'1234., ';'12.,* ';'1234., ','123.,* ',' 321.,* '};
12 Kommentare
Stephen23
am 15 Jun. 2018
@Bob Nbob: you are right, it does not appear in the Mfile help. I notice that many other useful regular expression features also do not appear in the Mfile help: notably missing are dynamic expressions, lookaround operators, and named capture.
Both the inbuilt help and the page I linked to give a very useful introduction, and explain all features of regular expressions in MATLAB:
doc regexp
doc('Regular Expressions')
Akzeptierte Antwort
Paolo
am 15 Jun. 2018
Bearbeitet: Paolo
am 15 Jun. 2018
Perhaps this can easily be achieved in two steps. For your input:
1 ****TABLE1****
COLUMN1= 1.12, 2.23, 3.34, 4.45, 5.56, 6.67,
COLUMN2= 0.00, 0.00, 0.00, 0.00, 0.00, 0.00,
COLUMN3= 1.23, 0.34, 3.45, 5.78*, 6.54*, 8.23,
1 ****TABLE2****
data = fileread('CORR.txt');
expression_sub = '(?<=\d\.\d*\*?)([\*\.,])';
data = regexprep(data,expression_sub,'');
Data will now not contain those characters. Data is now:
' 1 ****TABLE1****
COLUMN1= 1.12 2.23 3.34 4.45 5.56 6.67
COLUMN2= 0.00 0.00 0.00 0.00 0.00 0.00
COLUMN3= 1.23 0.34 3.45 5.78 6.54 8.23
1 ****TABLE2****
'
Step 2. Match your data. Live regex here. The expression is greedy and will try to match as many digit, full stop, digits combinations as it can. Therefore you don't need to repmat your expression like you showed.
expression_match = '(?<=COLUMN[1,3]=\s)(\d.?\d*\s)*';
[tokens,match] = regexp(data_sub,expression_match,'tokens','match');
Matlab manipulation.
column1 = str2double(strsplit(cell2mat(tokens{1}),' '));
column3 = str2double(strsplit(cell2mat(tokens{2}),' '));
column1 =
1.1200 2.2300 3.3400 4.4500 5.5600 6.6700
column3 =
1.2300 0.3400 3.4500 5.7800 6.5400 8.2300
Weitere Antworten (1)
George Abrahams
am 30 Dez. 2022
The others are right to fix the root problem causing the tricky nested cell array. Having said that, for future reference, my deepreplace function on File Exchange / GitHub would have done exactly what you requested.
x = {{{'1234., ';'12.,* ';'1234., ';'123.,* ';' 321.,* '}}};
% Remove any character except for digits (0-9) and period (.)
match = regexpPattern('[^\d.]');
x = deepreplace(x,match,'');
% x = 1×1 cell array
% {1×1 cell}
% x{1} = 1×1 cell array
% {5×1 cell}
% x{1}{1} = 5×1 cell array
% {'1234.'}
% {'12.' }
% {'1234.'}
% {'12310'}
% {'321.' }
0 Kommentare
Siehe auch
Kategorien
Mehr zu Text Data Preparation finden Sie in Help Center und File Exchange
Produkte
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!