how to use textscan to split a string containing numbers, NaN and strings with quotes (or not)?

Question

david am 28 Sep. 2017

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/358791-how-to-use-textscan-to-split-a-string-containing-numbers-nan-and-strings-with-quotes-or-not

Kommentiert: Walter Roberson am 29 Sep. 2017

Edit: the final purpose is to use textscan on a large file (~1gb), so processing the string before applying texscan is not possible.

This is the string I want to split with "textscan":

s = '-0.27,"NAN","NAN",0.6,"22/09/17 22:59"';

I have tried different syntax:

- test 1

textscan(s, '%f%f%f%f%s', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')

Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}

the best result, only problem: the left over quote at the end of the string. I don't understand why, are the chars listed in "Whitespace" not supposed to be removed?

- test 2

textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')

Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}

same as above

- test 3

textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')

Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}

fail to read NANs

- test 4

textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')

Result: [-0.2700] [NaN] [NaN] [0.6000] {0x1 cell}

fail to read the string

- test 5

textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')

Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}

fail to read NANs

- test 6

textscan(s, '%f"%f""%f"%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')

Result: [-0.2700] [NaN] [0x1 double] [0x1 double] {0x1 cell}

fail to read the 2nd NAN

- test 7

textscan(s, '%f"%f""%f"%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')

Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}

fail to read NANs

Any suggestion? Thanks

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Walter Roberson am 29 Sep. 2017

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/358791-how-to-use-textscan-to-split-a-string-containing-numbers-nan-and-strings-with-quotes-or-not#answer_283622

In MATLAB Online öffnen

textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'treat','"NAN"')

2 Kommentare
Keine anzeigenKeine ausblenden

david am 29 Sep. 2017

In MATLAB Online öffnen

Yes, it works! But the name is 'TreatAsEmpty', at least in my matlab version (2014b):

textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'TreatAsEmpty','"NAN"')

thanks a lot!

Walter Roberson am 29 Sep. 2017

In the version I tested in, 'TreatAsEmpty' can be abbreviated -- most parameter names can be abbreviated to their leading unique portion.

Melden Sie sich an, um zu kommentieren.

Answer 2

Guillaume am 28 Sep. 2017

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/358791-how-to-use-textscan-to-split-a-string-containing-numbers-nan-and-strings-with-quotes-or-not#answer_283508

In MATLAB Online öffnen

textscan always annoys me, it seems to have lots of hidden rules that are not explicitly stated. I would guess the problem is caused by your NaNs enclosed in quotes. The %f tells textscan to expect numbers yet it get strings. And if you ignore the quotes it throws the string detection off.

Easiest might be to just replace quoted nans by unquoted ones:

textscan(regexprep(s, '"NAN"', 'NAN', 'ignorecase'), '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' ')

works with your example.

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Walter Roberson am 29 Sep. 2017

You do not have a spare gigabyte of memory that you could read the entire string into with fileread() ? Though I guess you would need a second gigabyte to temporarily store the modified version.

david am 29 Sep. 2017

To be honest I did not even try, it sounds like not the best solution for large files.

Melden Sie sich an, um zu kommentieren.

how to use textscan to split a string containing numbers, NaN and strings with quotes (or not)?

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (1)

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

how to use textscan to split a string containing numbers, NaN and strings with quotes (or not)?

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (1)

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden