Is textscan ever better than readtable?

Question

Daniel Murphy am 9 Mär. 2018

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/387313-is-textscan-ever-better-than-readtable

Kommentiert: Walter Roberson am 3 Aug. 2021

I have some legacy code that reads in data from a text file. The author seems to avoid using readtable() in favour of using textscan() to get a cell array of strings and then converting the strings to the correct format afterwards. This seems like an awkward way of doing things, and takes a long time for big files so my questions are:

Is there any obvious reason to do this? Is textscan somehow more flexible/robust than readtable?
Is readtable optimised for reading data in a specified format? (i.e. faster than reading a string and converting)

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Von Duesenberg am 9 Mär. 2018

textscan is probably more "low level"; but recall that readtable was introduced in version R2013b, so maybe your legacy code was written before that.

Daniel Murphy am 9 Mär. 2018

Thanks to both of you - the cell array from textscan gets converted into a table after import, and I see evidence in comments of readtable having been used at one point, which is why I'm slightly confused

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Guillaume am 9 Mär. 2018

2
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/387313-is-textscan-ever-better-than-readtable#answer_309246

readtable internally calls textscan, but does a lot of work before to automatically detect the format of the file and after to split the data into variables of the correct type. So, a properly designed call to textscan and direct conversion to a table is always going to be faster than going through readtable.

What little you may lose in speed (file i/o is probably dominant anyway, so processing speed may not be critical), you make up for it by the flexibility of readtable. readtable is simply textscan on steroid (and it gets better with each release) so unless it is demonstrably slower I would always use it.

Note that early readtable wasn't as good at the autodetection. As said it gradually improved with each release since R2013b. The introduction of the import options in R2016b really makes it powerful.

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

LULU XUE am 23 Mär. 2018

cool

Melden Sie sich an, um zu kommentieren.

Answer 2

Walter Roberson am 9 Mär. 2018

1
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/387313-is-textscan-ever-better-than-readtable#answer_309241

textscan() can be a bit more flexible in handling challenges such as using commas for decimal point, or odd quoting, or reading time-like things "raw" because spaces in time formats confuse both textscan and readtable.

Generally, textscan() has more control over skipping data, and more control over number of lines to be processed.

On the other hand with fairly recent updates adding the detectImportOptions facility, readtable can do some fixed-width reading that textscan struggles with.

7 Kommentare
5 ältere Kommentare anzeigen5 ältere Kommentare ausblenden

Jeremy Hughes am 3 Aug. 2021

@Walter Roberson - Wow, I failed to look at the date. Not even sure why this popped up on my radar today.

Ahh, I see. It's not textscan doing it, but I see how textscan enables that in a susinct line of code. I tend to look at "modify the original data" as a last resort, whether it's in a file or a char-array. Having to read the whole file into memory can be an issue.

Walter Roberson am 3 Aug. 2021

In MATLAB Online öffnen

In cases where the file fits in memory, my experience is that reading as character and transforming the characters can be very effective -- relatively easy to code, and sometimes big performance gains compared to parsing line-by-line. This is especially true for semi-structured files, such as files that have repeated blocks of headers and data, or files that have fixed text with embedded numbers.

For example, readtable() and textscan() are not very good at reading a file that looks like

The temperature in Winnipeg at 15:17 was 93 degrees.
The temperature in Thunder Bay at 15:18 was 88 degrees.
The temperature in Newcastle On The Tyne at 15:18 was -3.8 degrees.

but reading as text and doing a regexp 'names' can work really well.

Melden Sie sich an, um zu kommentieren.

Is textscan ever better than readtable?

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

7 Kommentare
5 ältere Kommentare anzeigen5 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

Is textscan ever better than readtable?

3 Kommentare 1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Weitere Antworten (1)

7 Kommentare 5 ältere Kommentare anzeigen5 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Community Treasure Hunt

3 Kommentare
1 älteren Kommentar anzeigen1 älteren Kommentar ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

7 Kommentare
5 ältere Kommentare anzeigen5 ältere Kommentare ausblenden