Readtable with mixed variable types - 2021a version behaving differently than 2019a

Question

Martin Melcher am 19 Aug. 2021

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/1437204-readtable-with-mixed-variable-types-2021a-version-behaving-differently-than-2019a

Bearbeitet: dpb am 23 Aug. 2021

Hello,

I'm using readtable to read an Excel file. The file I'm reading is not strictly organized by columns. For example, line 1 can have a string in the second column line 2 a number and line 3 could have an empty cell.

In 2019a this was no issue, I would simply get a table with empty strings or numbers read as strings, which I could easily convert.

In 2021a some columns are detected as numeric, and if the cell contains a string, it is simply read as "NaN". If I force the variable type to 'string', I still get empty cells (rather than empty strings), which breaks my subsequent code.

Which set of options can I pass to readtable so that

- cells containing strings are read as strings

- empty cells (in columns that are otherwise populated) are read as empty strings

- the number of columns read = maxmimum number of columns containing data in any row?

Thanks,

Martin

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Martin Melcher am 20 Aug. 2021

Hello everyone,

I was not using any specific options in readtable, other than specifying the worksheet and setting "ReadVariableNames" to 0.

Thanke to the previous comment, I used readcell followed by cell2table. This worked with minimal adjustments - as empty cells were now reported as missing value, where they were previously read as empty strings.

Thanks for the good suggestion. I know there are probably much more foolproof and clean ways of coding, but I'm still surprised that backwards compatibility cannot be taken for granted after an upgrade.. I have never seen that in any other programming language.

Cheers,

Martin

dpb am 22 Aug. 2021

Bearbeitet: dpb am 23 Aug. 2021

In MATLAB Online öffnen

Unlike Fortran or C/C++, etc., MATLAB is a proprietary product not bound by a Standards Committee so, while there is an attempt at maintaining backwards compatiability at a given level, it is not at all unusual for Mathworks to make changes in operational behavior of various functions -- particularly higher-level abstractions like readtable are regularly improved. As a relatively recent introduction, the enhanced scanning is most often of benefit in being able to more accurately assess and import irregular files at the cost of some more overhead that is occasionally noticeable. Unfortunately, "there is no free lunch!" and so once in a while a revision such as this can cause a hiccup in previous code as you've noticed here.

In general, it's probably more reliable to spend a little more time with the import options in such a case and rely less on the default processing--which is, again, somewhat of a conundrum in that the whole point is to make the function more of an "easy-to-use, no intervention" tool. Sometimes it succeeds, ocasionally, it ends up going the other way. There is no perfect solution other than status quo which also isn't a viable development model.

TMW is pretty good about documenting changes; this one occurred in R2020a

readtable Function: Uses results of detectImportOptions function by default
Starting in R2020a, the readtable function uses the results of the detectImportOptions 
function to import tabular data. In essence, these two readtable function calls behave 
identically.
T = readtable(filename)
T = readtable(filename,detectImportOptions(filename))
Compatibility Considerations
There are several differences between the default behavior of readtable and its default 
behavior in previous releases. To call readtable with the default behavior it had up to 
R2019b, use the 'Format','auto' name-value pair argument.
T = readtable(filename,'Format','auto')
...

The whole skinny is at <release-notes-link> although have to navigate to the R202a section and then the Data Import subsection.

Of course, if one doesn't update every release, there's a lot to go through every six months to have any hope of staying abreast...one of the disadvantages of such an active development cycle as compared to the advantage of new features and bug fixes...it's a tradeoff everybody has to make for themselves.

For mission-critical code, it is really a conundrum...one almost has to redo the whole validation exercise on each release which may be a very expensive and time-consuming effort.

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Readtable with mixed variable types - 2021a version behaving differently than 2019a

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Readtable with mixed variable types - 2021a version behaving differently than 2019a

4 Kommentare 2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden

Antworten (0)

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

4 Kommentare
2 ältere Kommentare anzeigen2 ältere Kommentare ausblenden