how to cleanly ensure column type in table is numerical

Question

Romain am 3 Jan. 2025

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/2172646-how-to-cleanly-ensure-column-type-in-table-is-numerical

Kommentiert: Walter Roberson am 5 Jan. 2025

Dear all,

I am currently struggling with table when loading simple files. I have a csv file to load, and I use the readtable function. In my specific example, the last 3 columns should be numeric, always. They actually only ever contain 0's, 1's and -1's, so I expect them to be detected as numeric. Yet, depending on the contents of the other columns, it seems the type is sometimes interpreted as char. In file_2.csv (attached), the readtable function properly reads the final 3 columns as double. In file_1.csv (also attached), the columns are read as cell arrays of chars. I am using Matlab 9.5.0.944444 (R2018b). yes, this is an old version, but I expect it to work properly.

So, my questions:

1. Why is it that the columns are sometimes not read as doubles, as expected?

2. For structural reasons, I cannot select in advance the options passed to readtable to force double type. In fact, I can't even assume how many columns there will be in the files, nor their types. So I can only use a vanilla data = readtable(file_1.csv). How can I make sure that the columns containing only numbers will be properly read as doubles?

3. If the columns will suffer anyway from the indeterminacy issue, is there a clean way I can convert them to double? For instance, data.shock3 = str2double(data.shock3) will work if the column is read as cell array, but will return exception if column is already read as type double. A try/catch block is possible, but really ugly. I wish there is a simple/clean way to solve that.

Could you please help?

Thanks a lot

Romain

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Paul am 3 Jan. 2025

In MATLAB Online öffnen

Here on Answers, the last three columns of file_1 and file_2 are both read as doubles. Maybe someone with R2018b can troubleshoot ...

T = readtable('file_1.csv');
T(1:5,end-2:end)
ans = 5x3 table
    shock1    shock2    shock3
    ______    ______    ______

      1         -1         0  
      1          0         0  
      1          0         0  
      1         -1         0  
      1          0        -1  
T = readtable('file_2.csv');
T(:,end-2:end)
ans = 3x3 table
    shock1    shock2    shock3
    ______    ______    ______

      1         -1        0   
      1          0        0   
      1         -1        0   

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Star Strider am 3 Jan. 2025

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2172646-how-to-cleanly-ensure-column-type-in-table-is-numerical#answer_1556840

In MATLAB Online öffnen

It reads them correctly in R2024b.

Use the detectImportOptions and setvartype functions to set them to 'double'. (I can’t test this in R2018b, however the functions were introduced in R2016b, so you should havee access to them.) Since R2024b reads them correctly, I used these functions to do the reverse of what you want to do, and converted them to 'char'. You need to convert them to 'double' instead.

csvs = dir('*.csv');
for k = 1:numel(csvs)
    filename = csvs(k).name
    T{k} = readtable(filename);
end
filename = 'file_1.csv'
filename = 'file_2.csv'
T{1}
ans = 6x6 table
         type          variable       period      shock1    shock2    shock3
    ______________    __________    __________    ______    ______    ______

    {'sign'      }    {'ffrate'}           NaT      1         -1         0  
    {'zero'      }    {'gap'   }           NaT      1          0         0  
    {'shock'     }    {'-'     }    2010-03-31      1          0         0  
    {'historical'}    {'inf'   }    2010-06-30      1         -1         0  
    {'historical'}    {'inf'   }    2010-06-30      1          0        -1  
    {'covariance'}    {'gap'   }           NaT      1         -1         0  
T{2}
ans = 3x6 table
         type          variable     period    shock1    shock2    shock3
    ______________    __________    ______    ______    ______    ______

    {'sign'      }    {'ffrate'}       1        1         -1        0   
    {'zero'      }    {'gap'   }       0        1          0        0   
    {'covariance'}    {'gap'   }     NaN        1         -1        0   
for k = 1:numel(csvs)
    filename = csvs(k).name
    opts = detectImportOptions(filename);
    opts = setvartype(opts, {'shock1','shock2','shock3'}, 'char');
    T{k} = readtable(filename, opts);
end
filename = 'file_1.csv'
filename = 'file_2.csv'
T{1}
ans = 6x6 table
         type          variable       period      shock1    shock2    shock3
    ______________    __________    __________    ______    ______    ______

    {'sign'      }    {'ffrate'}           NaT    {'1'}     {'-1'}    {'0' }
    {'zero'      }    {'gap'   }           NaT    {'1'}     {'0' }    {'0' }
    {'shock'     }    {'-'     }    2010-03-31    {'1'}     {'0' }    {'0' }
    {'historical'}    {'inf'   }    2010-06-30    {'1'}     {'-1'}    {'0' }
    {'historical'}    {'inf'   }    2010-06-30    {'1'}     {'0' }    {'-1'}
    {'covariance'}    {'gap'   }           NaT    {'1'}     {'-1'}    {'0' }
T{2}
ans = 3x6 table
         type          variable     period    shock1    shock2    shock3
    ______________    __________    ______    ______    ______    ______

    {'sign'      }    {'ffrate'}       1      {'1'}     {'-1'}    {'0'} 
    {'zero'      }    {'gap'   }       0      {'1'}     {'0' }    {'0'} 
    {'covariance'}    {'gap'   }     NaN      {'1'}     {'-1'}    {'0'} 

Try that approach.

.

2 Kommentare
Keine anzeigenKeine ausblenden

Romain am 5 Jan. 2025

Thanks for the answer. As I said, the problem is that I precisely cannot use the detectImportOptions and setvartype functions to set the columns to 'double', as I use a generic importer function that may import other tables as well. Seems like I will have to write a specific function for that and use your strategy anyway. Pitty that Matlab does not properly detect the right data type.

Star Strider am 5 Jan. 2025

As always, my pleasure!

That may have been at least partiially solved in later versions (or Updates), since that isn’t a problem in R2024b. (I don’t know when the relevant change actually occured.) So upgrading to a more recent version could be one option. The readtable function makes a very good guess at the data type, however it may need external information (such as detectImportOptions) to get everythinng correct.

I don’t know what other files you may have, or what the variable names are, so I can’t suggest a robust solution. Reading the variable names first and then using some sort of logic to use with detectImportOpttions based on theee variable names, and then reading the entire file again using the ‘new’ opts structure could be a solution.

Melden Sie sich an, um zu kommentieren.

Answer 2

Walter Roberson am 3 Jan. 2025

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/2172646-how-to-cleanly-ensure-column-type-in-table-is-numerical#answer_1556845

Bearbeitet: Walter Roberson am 3 Jan. 2025

In MATLAB Online öffnen

In R2018b, it is enough to do something like

filename = 'file_1.csv';
opt = detectImportOptions(filename);
T = readtable(filename, opt);

However, if you do this, then the period field is read as datetime objects, leading to several NaT entries. That is incorrect: the field has entries such as 1, 0, and '-' as well as datetimes, so the only reasonable option is to import the field as character. You can set setvartype() for that,

filename = 'file_1.csv';
opt = detectImportOptions(filename);
opt = setvartype(opt, 'period', 'char');
T = readtable(filename, opt);

It is not clear how you would process this character field afterwards

2 Kommentare
Keine anzeigenKeine ausblenden

Romain am 5 Jan. 2025

Thanks for the suggestion. Indeed, this leads to incorrect typing and many NaT entries, as I also experienced on my side. I wanted to avoid detectImportOptions(filename) to keep my data importer generic and usable for other tables, but seems like I may have no choice and work on a specific solution for this one table. thanks anyway.

Walter Roberson am 5 Jan. 2025

In MATLAB Online öffnen

Using just

opt = detectImportOptions(filename);
T = readtable(filename, opt);

is pretty generic.

Melden Sie sich an, um zu kommentieren.

how to cleanly ensure column type in table is numerical

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigenKeine ausblenden

Weitere Antworten (1)

2 Kommentare
Keine anzeigenKeine ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

how to cleanly ensure column type in table is numerical

1 Kommentar -1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigenKeine ausblenden

Weitere Antworten (1)

2 Kommentare Keine anzeigenKeine ausblenden

Siehe auch

Kategorien

Tags

Community Treasure Hunt

1 Kommentar
-1 ältere Kommentare anzeigen-1 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden

2 Kommentare
Keine anzeigenKeine ausblenden