How to convert categorical data to numeric in separate columns?

16 Ansichten (letzte 30 Tage)
Amy Rafferty
Amy Rafferty am 10 Aug. 2020
Kommentiert: Adam Danz am 26 Okt. 2020
% Hi! I have a dataset 'data5' with a column 'Location' which contains values Asia, US and Africa.
% I'm wanting to convert it to 3 separate columns, one for each location, which contains a 1 if the row is from that location and 0 otherwise
% This is the function I have created:
function data = categorical_values(data, var)
uniques = unique(var);
for i = 1:length(uniques)
values(:, i) = double(ismember(var, uniques(i)));
end
t = table;
[rows, cols] = size(values);
for i = 1:cols
t1 = table(values(:, i));
t1.Properties.VariableNames = uniques(i);
t = [t t1];
end
data = [t data];
end
% And this is the code I have been running, in a file called prep.m:
new = categorical_values(data5, data5.Location);
new.Location = []; % delete the old Location column
% I have been getting this error:
Error using categorical_values (line 11)
The VariableNames property is a cell array of character vectors. To
assign multiple variable names, specify names in a string array or a cell
array of character vectors.
Error in prep (line 16)
new = categorical_values(data5, data5.Location);
% Can anyone help??????? Thanks!

Antworten (1)

Adam Danz
Adam Danz am 10 Aug. 2020
Bearbeitet: Adam Danz am 26 Okt. 2020
Here's a more efficient solution.
% Create demo data
location = categorical({'Asia','US','Asia','Africa','Africa','US','US','Asia'}');
unqCountries = unique(location(:)')
unqCountries = 1×3 categorical array
Africa Asia US
% Create matrix of 1s % 0s.
% Columns are identified by "unqCountries"
countryIdx = location(:) == unqCountries
countryIdx = 8x3 logical array
0 1 0 0 0 1 0 1 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0
% If you want to turn it into a table
T = array2table(countryIdx, 'VariableNames', string(unqCountries))
T = 8x3 table
Africa Asia US ______ _____ _____ false true false false false true false true false true false false true false false false false true false false true false true false
The error you're getting is because you're assigning a categorical variable as a table variable name which must be a character array or string. Convert to string:
t1.Properties.VariableNames = string(unique(i));
  4 Kommentare
Mohammad Zahid
Mohammad Zahid am 26 Okt. 2020
Is this same as dummy coding or One Hot Encoding?
Adam Danz
Adam Danz am 26 Okt. 2020
"Is this same as dummy coding or One Hot Encoding?"
The T table could be used as dummy variables and contains binary values (true|false) which is similar to using dummy variables in regression.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Data Type Conversion finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by