Hi!
I have a list (1 column, 601 rows) of the most popular male and female surnames and they are marked in another column as either M for male or F for female. I have another list of surnames of people from a statistical survey (which does not have the same dimensions as the list of names). I want to compare the names from the survey with the names in my list and mark them as either M or F if they are recognized. If they are not found in my list, I want to leave them blank. Does anyone know how I can do this?
Many thanks in advance.

2 Kommentare

KSSV
KSSV am 11 Jul. 2018
This can be done with strcmp and ismemebr, can you share your data?
Jan
Jan am 11 Jul. 2018
What exactly is "a list"? Prefer to post a small Matlab code, which creates a representative data set. Then suggesting some code is much easier.

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Guillaume
Guillaume am 11 Jul. 2018
Bearbeitet: Guillaume am 11 Jul. 2018

1 Stimme

Very easy to do:
%inputs:
%genderlist = Mx2 cell array, 1st column name, 2nd column gender
%namelist = Nx1 cell array, list of names that need gender
%output
%namelistwithgender = Nx2 cell array, 1st column from namelist, 2nd column corresponding gender if found in genderlist, empty otherwise
[isfound, where] = ismember(namelist, genderlist(:, 1));
namelistwithgender = namelist;
namelistwithgender(isfound, 2) = genderlist(where(isfound), 2);
Note that the search is performed case sensitive. If you want to ignore case, then convert both lists to lower in the ismember call.

6 Kommentare

Alexander Engman
Alexander Engman am 11 Jul. 2018
Thank you for your helpful answer. Is there any way for me to read the lists from an Excel spreadsheet, rather than creating matrices in Matlab?
Importing spreadsheets into matlab is trivial to do. You can use the ancient xlsread (import as cell array) or the more modern readtable (import as table).
If your excel file has a column header readtable will detect it automatically and use it to name the column. The code is more or less identical:
genderlist = readtable(yourgenderlistfile); %should create a Mx2 table
namelist = readtable(yournamelistfile); %should create a Mx1 table
[isfound, where] = ismber(namelist{:, 1}, genderlist{:, 1})
namelistwithgender = namelist;
namelist{isfound, 2} = genderlist{where(isfound), 2};
Alexander Engman
Alexander Engman am 12 Jul. 2018
Wow! This works amazingly well. The only problem was that I did not add a heading to my Excel spreadsheet, so the first name became the headline. Other than that, perfect. Thank you so much!
For a spreadsheet with no header, you simply add the option 'ReadVariablesNames', false to readtable.
Afterwards, you can also give names to the columns:
%reading of the two files
genderlist = readtable(yourgenderlistfile, 'ReadVariableNames', false);
namelist = readtable(, 'ReadVariableNames', false);
%naming of the columns (as the default Var1, Var2, etc. are not very useful)
genderlist.Properties.VariableNames = {'name', 'gender'};
namelist.Properties.VariableNames = {'name'};
%look-up gender for namelist. We can use the nice variable names instead of column indices
[isfound, where] = ismember(namelist.name, genderlist.name)
namelistwithgender = namelist;
namelist.gender(isfound) = genderlist.gender(where(isfound));
Alexander Engman
Alexander Engman am 13 Jul. 2018
Thank you!
A lot of the names are actually combinations or "double names", they are connected with a hyphen, for example a combination of the names "Anna" and "Maria" would be "Anna-Maria". Is there a way to give the name a gender if either or both of the names are recognized?
Also, how do I write the code to not make it case-sensitive?
Thank you so much!
Just use lower() and strrep():
namelist = lower(namelist); % Everything is lower case after this.
theName = namelist{:, 1};
theName = strrep(theName, '-', ' '); % Replace dashes with spaces.
% Get cell array of names
ca = strsplit(theName)
for k = 1 : length(ca)
thisName = ca{k}; % Extract first word
% Check if thisName is in each gender namelist.
etc.

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Image Analyst
Image Analyst am 11 Jul. 2018

0 Stimmen

I'd get a distribution and then use k nearest neighbors. After all, there are several names with varying numbers of people in either gender, like chris, robin, ariel, sam, pat, etc.

2 Kommentare

Alexander Engman
Alexander Engman am 11 Jul. 2018
That is a great input. These are however Swedish names, and we have very few gender neutral ones.
Then just use xlsread() to read in your reference name lists, and your "test/validation" set of names and use ismember(), something like (untested):
[numbers, names, raw] = xlsread(filename);
femaleNames = strings(:, 1); % Female names in column 1.
maleNames = strings(:, 2); % Male names in column 2.
testNames = strings(:, 3); % Test names in column 3.
for k = 1 : length(testNames)
inFemaleList(k,1) = ismember(testNames{k}, femaleNames);
inMaleList(k,2) = ismember(testNames{k}, maleNames);
end

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by