how to cleanly ensure column type in table is numerical
19 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Romain
am 3 Jan. 2025
Kommentiert: Walter Roberson
am 5 Jan. 2025
Dear all,
I am currently struggling with table when loading simple files. I have a csv file to load, and I use the readtable function. In my specific example, the last 3 columns should be numeric, always. They actually only ever contain 0's, 1's and -1's, so I expect them to be detected as numeric. Yet, depending on the contents of the other columns, it seems the type is sometimes interpreted as char. In file_2.csv (attached), the readtable function properly reads the final 3 columns as double. In file_1.csv (also attached), the columns are read as cell arrays of chars. I am using Matlab 9.5.0.944444 (R2018b). yes, this is an old version, but I expect it to work properly.
So, my questions:
1. Why is it that the columns are sometimes not read as doubles, as expected?
2. For structural reasons, I cannot select in advance the options passed to readtable to force double type. In fact, I can't even assume how many columns there will be in the files, nor their types. So I can only use a vanilla data = readtable(file_1.csv). How can I make sure that the columns containing only numbers will be properly read as doubles?
3. If the columns will suffer anyway from the indeterminacy issue, is there a clean way I can convert them to double? For instance, data.shock3 = str2double(data.shock3) will work if the column is read as cell array, but will return exception if column is already read as type double. A try/catch block is possible, but really ugly. I wish there is a simple/clean way to solve that.
Could you please help?
Thanks a lot
Romain
1 Kommentar
Paul
am 3 Jan. 2025
Here on Answers, the last three columns of file_1 and file_2 are both read as doubles. Maybe someone with R2018b can troubleshoot ...
T = readtable('file_1.csv');
T(1:5,end-2:end)
T = readtable('file_2.csv');
T(:,end-2:end)
Akzeptierte Antwort
Star Strider
am 3 Jan. 2025
It reads them correctly in R2024b.
Use the detectImportOptions and setvartype functions to set them to 'double'. (I can’t test this in R2018b, however the functions were introduced in R2016b, so you should havee access to them.) Since R2024b reads them correctly, I used these functions to do the reverse of what you want to do, and converted them to 'char'. You need to convert them to 'double' instead.
csvs = dir('*.csv');
for k = 1:numel(csvs)
filename = csvs(k).name
T{k} = readtable(filename);
end
T{1}
T{2}
for k = 1:numel(csvs)
filename = csvs(k).name
opts = detectImportOptions(filename);
opts = setvartype(opts, {'shock1','shock2','shock3'}, 'char');
T{k} = readtable(filename, opts);
end
T{1}
T{2}
Try that approach.
.
2 Kommentare
Star Strider
am 5 Jan. 2025
As always, my pleasure!
That may have been at least partiially solved in later versions (or Updates), since that isn’t a problem in R2024b. (I don’t know when the relevant change actually occured.) So upgrading to a more recent version could be one option. The readtable function makes a very good guess at the data type, however it may need external information (such as detectImportOptions) to get everythinng correct.
I don’t know what other files you may have, or what the variable names are, so I can’t suggest a robust solution. Reading the variable names first and then using some sort of logic to use with detectImportOpttions based on theee variable names, and then reading the entire file again using the ‘new’ opts structure could be a solution.
Weitere Antworten (1)
Walter Roberson
am 3 Jan. 2025
Bearbeitet: Walter Roberson
am 3 Jan. 2025
In R2018b, it is enough to do something like
filename = 'file_1.csv';
opt = detectImportOptions(filename);
T = readtable(filename, opt);
However, if you do this, then the period field is read as datetime objects, leading to several NaT entries. That is incorrect: the field has entries such as 1, 0, and '-' as well as datetimes, so the only reasonable option is to import the field as character. You can set setvartype() for that,
filename = 'file_1.csv';
opt = detectImportOptions(filename);
opt = setvartype(opt, 'period', 'char');
T = readtable(filename, opt);
It is not clear how you would process this character field afterwards
2 Kommentare
Walter Roberson
am 5 Jan. 2025
Using just
opt = detectImportOptions(filename);
T = readtable(filename, opt);
is pretty generic.
Siehe auch
Kategorien
Mehr zu Data Type Identification finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!