MATLAB Answers

Liutong
0

How to tranfer the data types of the selected columns of the table from cell string to numeric

Asked by Liutong
on 6 Apr 2017
Latest activity Commented on by Liutong
on 12 Apr 2017
I want to tranfer the data type of the selected columns of the table from cellstring to numeric.
However, I am stuck by this error: cannot convert from double to cell
selected_cols={'Dailypeakwindspeed', 'Dailyprecip', ...
'Dailysnowdepth', 'Dailysnowfall'};
% tb_daily is my table
tb_daily(:,selected_cols)= varfun( @str2double, tb_daily,...
'InputVariables', selected_cols); % this line errors
I found out that tb.Var is not equivalent with tb{:,'Var'}, which is shown by the example below.
tb=table({'1';'2'});
tb(:,'Var1')= table([1;2]) % this gives error: cannot convert from double to cell
tb{:,'Var1'}= [1;2] % this gives error: cannot convert from double to cell
tb.Var1= [1;2] % this is successful! Weird!!!
Too weird!
One solution would be to use the dot syntax, but that means I have to repeat this for every selected column, therefore not satisfied.

  0 Comments

Sign in to comment.

1 Answer

Answer by Peter Perkins
on 7 Apr 2017
 Accepted Answer

What you're seeing is because parenthesis and brace assignments are into, while dot assignments replace. To see why, consider tbl(2:end-1,'Var1') = value -- you wouldn't want that to be able to change Var1's type.
There are two options:
1) Write a loop like this:
for i = 1:length(selectedVars)
varName = selectedVars{i};
tbl.(varName) = str2double(t.(varName));
end
If you want to convert all the cellstrs to numeric, you could also write the loop like this:
for i = 1:width(tbl)
var = tbl.(i);
if iscellstr(var)
tbl.(i) = str2double(var);
end
end
2) Horzontally concatenate the output from varfun with the remaining part of tbl, and rearrange the variables afterwards. Something like this:
origVarOrder = tbl.Properties.VariableNames;
selectedVars = varfun(@isnumeric,tbl,'UniformOutput',true);
tbl1 = tbl(~selectedVars);
tbl2 = varfun(@str2double,tbl,'InputVariables',selectedVars);
tbl = [tbl1 tbl2];
tbl = tbl(:,origVarOrder);
Hope this helps.

  3 Comments

Thanks for the reply. The first solution is good.
There are some typos in the second solution (the 3rd code block). The 4th line will return tbl2 all NaNs. I guess you mean the following for the 2nd line.
selectedVars = varfun(@iscellstr,tbl,'UniformOutput',true);
However, the 4th line will not return tbl2 with the original Column Names. Instead, tbl2 will have {'fun_originalname1','fun_originalname2',....} as its VariableNames. So the last line will still error.
But I get your idea.
What I actually did yesterday was to first delete the original cellstring columns and then append new transformed columns. Pretty much like what you suggested as solution 2.
However... This experience is painful. I struggled a moment searching online and had to admit that the best solution is to switch to python pandas, which has an in-built to_numeric method for the dataframe class. And dataframe can hold mixed data types even within a column. Unlike Matlab table.... Sad ....
Apologies from the mistakes, was typing without a MATLAB in front of me.
Your real problem is how those variables in the table got to be strings to begin with. This seems like an import issue that should be solved at its source. You didn't provide any information on why that is, so can't really help you there.
If a numeric array is what you want, from a table that's all cellstrs, this is how to do it:
>> t = table({'1';'2';'3'},{'4';'5';'6'})
t =
3×2 table
Var1 Var2
____ ____
'1' '4'
'2' '5'
'3' '6'
>> str2double(table2array(t))
ans =
1 4
2 5
3 6
If you want to hold a mixture of numeric and text in one variable in a table, here's how you do it:
>> t = table([1;2;3],{'4';5;'6'},{'7';'8';'9'})
t =
3×3 table
Var1 Var2 Var3
____ ____ ____
1 '4' '7'
2 [5] '8'
3 '6' '9'
Why you would want that, I don't know, it seems like something you'd want to fix.
Thank you for the reply.
I should have provided a more detailed background for the problem. I describe it below.
Background: I have a messy data.csv file. Some of the columns contain records like this
%data.csv file
col1,col2
1.0,2ss
1.2s,3.
...,...
To the best of my knowledge, I can only readtable by setting the VariableTypes as string/cellstr first and then process the strings to cut the tail. For example regexprep(col1,'^(\d+)s+','$1') gives ['1';'1.2'].
After I cut the tail comprising 's's, I convert column type from cellctr to numeric using str2double. And that was where I encountered the problem, which has been solved using your solution 2.

Sign in to comment.