how to use textscan to split a string containing numbers, NaN and strings with quotes (or not)?

12 Ansichten (letzte 30 Tage)
Edit: the final purpose is to use textscan on a large file (~1gb), so processing the string before applying texscan is not possible.
This is the string I want to split with "textscan":
s = '-0.27,"NAN","NAN",0.6,"22/09/17 22:59"';
I have tried different syntax:
- test 1
textscan(s, '%f%f%f%f%s', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}
the best result, only problem: the left over quote at the end of the string. I don't understand why, are the chars listed in "Whitespace" not supposed to be removed?
- test 2
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}
same as above
- test 3
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
- test 4
textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {0x1 cell}
fail to read the string
- test 5
textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
- test 6
textscan(s, '%f"%f""%f"%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [NaN] [0x1 double] [0x1 double] {0x1 cell}
fail to read the 2nd NAN
- test 7
textscan(s, '%f"%f""%f"%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
Any suggestion? Thanks

Akzeptierte Antwort

Walter Roberson
Walter Roberson am 29 Sep. 2017
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'treat','"NAN"')
  2 Kommentare
david
david am 29 Sep. 2017
Yes, it works! But the name is 'TreatAsEmpty', at least in my matlab version (2014b):
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'TreatAsEmpty','"NAN"')
thanks a lot!
Walter Roberson
Walter Roberson am 29 Sep. 2017
In the version I tested in, 'TreatAsEmpty' can be abbreviated -- most parameter names can be abbreviated to their leading unique portion.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Guillaume
Guillaume am 28 Sep. 2017
textscan always annoys me, it seems to have lots of hidden rules that are not explicitly stated. I would guess the problem is caused by your NaNs enclosed in quotes. The %f tells textscan to expect numbers yet it get strings. And if you ignore the quotes it throws the string detection off.
Easiest might be to just replace quoted nans by unquoted ones:
textscan(regexprep(s, '"NAN"', 'NAN', 'ignorecase'), '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' ')
works with your example.
  3 Kommentare
Walter Roberson
Walter Roberson am 29 Sep. 2017
You do not have a spare gigabyte of memory that you could read the entire string into with fileread() ? Though I guess you would need a second gigabyte to temporarily store the modified version.
david
david am 29 Sep. 2017
To be honest I did not even try, it sounds like not the best solution for large files.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Data Type Identification finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by