how to cleanly ensure column type in table is numerical

19 Ansichten (letzte 30 Tage)
Romain
Romain am 3 Jan. 2025
Kommentiert: Walter Roberson am 5 Jan. 2025
Dear all,
I am currently struggling with table when loading simple files. I have a csv file to load, and I use the readtable function. In my specific example, the last 3 columns should be numeric, always. They actually only ever contain 0's, 1's and -1's, so I expect them to be detected as numeric. Yet, depending on the contents of the other columns, it seems the type is sometimes interpreted as char. In file_2.csv (attached), the readtable function properly reads the final 3 columns as double. In file_1.csv (also attached), the columns are read as cell arrays of chars. I am using Matlab 9.5.0.944444 (R2018b). yes, this is an old version, but I expect it to work properly.
So, my questions:
1. Why is it that the columns are sometimes not read as doubles, as expected?
2. For structural reasons, I cannot select in advance the options passed to readtable to force double type. In fact, I can't even assume how many columns there will be in the files, nor their types. So I can only use a vanilla data = readtable(file_1.csv). How can I make sure that the columns containing only numbers will be properly read as doubles?
3. If the columns will suffer anyway from the indeterminacy issue, is there a clean way I can convert them to double? For instance, data.shock3 = str2double(data.shock3) will work if the column is read as cell array, but will return exception if column is already read as type double. A try/catch block is possible, but really ugly. I wish there is a simple/clean way to solve that.
Could you please help?
Thanks a lot
Romain
  1 Kommentar
Paul
Paul am 3 Jan. 2025
Here on Answers, the last three columns of file_1 and file_2 are both read as doubles. Maybe someone with R2018b can troubleshoot ...
T = readtable('file_1.csv');
T(1:5,end-2:end)
ans = 5x3 table
shock1 shock2 shock3 ______ ______ ______ 1 -1 0 1 0 0 1 0 0 1 -1 0 1 0 -1
T = readtable('file_2.csv');
T(:,end-2:end)
ans = 3x3 table
shock1 shock2 shock3 ______ ______ ______ 1 -1 0 1 0 0 1 -1 0

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Star Strider
Star Strider am 3 Jan. 2025
It reads them correctly in R2024b.
Use the detectImportOptions and setvartype functions to set them to 'double'. (I can’t test this in R2018b, however the functions were introduced in R2016b, so you should havee access to them.) Since R2024b reads them correctly, I used these functions to do the reverse of what you want to do, and converted them to 'char'. You need to convert them to 'double' instead.
csvs = dir('*.csv');
for k = 1:numel(csvs)
filename = csvs(k).name
T{k} = readtable(filename);
end
filename = 'file_1.csv'
filename = 'file_2.csv'
T{1}
ans = 6x6 table
type variable period shock1 shock2 shock3 ______________ __________ __________ ______ ______ ______ {'sign' } {'ffrate'} NaT 1 -1 0 {'zero' } {'gap' } NaT 1 0 0 {'shock' } {'-' } 2010-03-31 1 0 0 {'historical'} {'inf' } 2010-06-30 1 -1 0 {'historical'} {'inf' } 2010-06-30 1 0 -1 {'covariance'} {'gap' } NaT 1 -1 0
T{2}
ans = 3x6 table
type variable period shock1 shock2 shock3 ______________ __________ ______ ______ ______ ______ {'sign' } {'ffrate'} 1 1 -1 0 {'zero' } {'gap' } 0 1 0 0 {'covariance'} {'gap' } NaN 1 -1 0
for k = 1:numel(csvs)
filename = csvs(k).name
opts = detectImportOptions(filename);
opts = setvartype(opts, {'shock1','shock2','shock3'}, 'char');
T{k} = readtable(filename, opts);
end
filename = 'file_1.csv'
filename = 'file_2.csv'
T{1}
ans = 6x6 table
type variable period shock1 shock2 shock3 ______________ __________ __________ ______ ______ ______ {'sign' } {'ffrate'} NaT {'1'} {'-1'} {'0' } {'zero' } {'gap' } NaT {'1'} {'0' } {'0' } {'shock' } {'-' } 2010-03-31 {'1'} {'0' } {'0' } {'historical'} {'inf' } 2010-06-30 {'1'} {'-1'} {'0' } {'historical'} {'inf' } 2010-06-30 {'1'} {'0' } {'-1'} {'covariance'} {'gap' } NaT {'1'} {'-1'} {'0' }
T{2}
ans = 3x6 table
type variable period shock1 shock2 shock3 ______________ __________ ______ ______ ______ ______ {'sign' } {'ffrate'} 1 {'1'} {'-1'} {'0'} {'zero' } {'gap' } 0 {'1'} {'0' } {'0'} {'covariance'} {'gap' } NaN {'1'} {'-1'} {'0'}
Try that approach.
.
  2 Kommentare
Romain
Romain am 5 Jan. 2025
Thanks for the answer. As I said, the problem is that I precisely cannot use the detectImportOptions and setvartype functions to set the columns to 'double', as I use a generic importer function that may import other tables as well. Seems like I will have to write a specific function for that and use your strategy anyway. Pitty that Matlab does not properly detect the right data type.
Star Strider
Star Strider am 5 Jan. 2025
As always, my pleasure!
That may have been at least partiially solved in later versions (or Updates), since that isn’t a problem in R2024b. (I don’t know when the relevant change actually occured.) So upgrading to a more recent version could be one option. The readtable function makes a very good guess at the data type, however it may need external information (such as detectImportOptions) to get everythinng correct.
I don’t know what other files you may have, or what the variable names are, so I can’t suggest a robust solution. Reading the variable names first and then using some sort of logic to use with detectImportOpttions based on theee variable names, and then reading the entire file again using the ‘new’ opts structure could be a solution.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Walter Roberson
Walter Roberson am 3 Jan. 2025
Bearbeitet: Walter Roberson am 3 Jan. 2025
In R2018b, it is enough to do something like
filename = 'file_1.csv';
opt = detectImportOptions(filename);
T = readtable(filename, opt);
However, if you do this, then the period field is read as datetime objects, leading to several NaT entries. That is incorrect: the field has entries such as 1, 0, and '-' as well as datetimes, so the only reasonable option is to import the field as character. You can set setvartype() for that,
filename = 'file_1.csv';
opt = detectImportOptions(filename);
opt = setvartype(opt, 'period', 'char');
T = readtable(filename, opt);
It is not clear how you would process this character field afterwards
  2 Kommentare
Romain
Romain am 5 Jan. 2025
Thanks for the suggestion. Indeed, this leads to incorrect typing and many NaT entries, as I also experienced on my side. I wanted to avoid detectImportOptions(filename) to keep my data importer generic and usable for other tables, but seems like I may have no choice and work on a specific solution for this one table. thanks anyway.
Walter Roberson
Walter Roberson am 5 Jan. 2025
Using just
opt = detectImportOptions(filename);
T = readtable(filename, opt);
is pretty generic.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Data Type Identification finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by