concatenating and comparing two datsets

2 Ansichten (letzte 30 Tage)
Sheetal Shirsat
Sheetal Shirsat am 16 Aug. 2019
Kommentiert: Sheetal Shirsat am 21 Aug. 2019
Hi,
I have two datasets in excel of cell arrays which have columns containing charaters and numbers. The first dataset has 300 rows and 10 cols, and the second dataset 250 rows and 9 columns.
Column 10th and 9th of dataset1 and datset 2 are numbers , while all other columns are characters.
  1. for the first dataset I want to concatenate the first 8 columns.
  2. For the second data sets I want to do the same as step.1
  3. If the concatenated description of 1st dataset is same as second then I want to add 10th column of first data to the 9th colmn of send datasets.
I am new here but is there a method in matlab to do is?. I was doing this in excel in the past and it has been time-consuming and I was making erros doing this manually.
Any help greatly appreciated.
Thanks.
SSR

Akzeptierte Antwort

Jon
Jon am 16 Aug. 2019
Bearbeitet: Jon am 16 Aug. 2019
Assuming that the 9th column of both data sets is the same, I think you can do this just with an innerjoin
tbl1 = readtable('data1.xlsx')
tbl2 = readtable('data2.xlsx')
% join them to make third table which adds additional column to
% second table where character columns match
tbl3 = innerjoin(tbl1,tbl2)
There are probably some use cases that Guillaume is covering in his answer that may be important, but maybe this simple approach works for what you are doing.
  21 Kommentare
Jon
Jon am 20 Aug. 2019
Sorry Guilaaume. I read your response too quickly, and just saw the assert, and didn't notice that the later code handles multiple matching rows.
Guillaumes approach is much cleaner than my suggestion of putting it in a loop. Definitely do it that way!
I think you should be all set now.
Sheetal Shirsat
Sheetal Shirsat am 21 Aug. 2019
This is Perfect !.
Thank you so very much Guillaume and Jon for your time and help.. ! Much appreciated !.
Best regards,
Shital

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Guillaume
Guillaume am 16 Aug. 2019
Bearbeitet: Guillaume am 16 Aug. 2019
Something like this should work:
dataset1 = readtable('C:\somewhere\your1stexcelfile.xlsx'); %may need extra options, depending on the excel file
dataset2 = readtable('C:\somewhere\your2ndexcelfile.xlsx'); %may need extra options, depending on the excel file
assert(width(dataset1) == 10 & width(dataset2) == 9, 'Dataset content does not match your description')
dataset1 = [rowfun(@(varargin) strjoin(varargin, ''), dataset1, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset1(:, 9:10)];
dataset2 = [rowfun(@(varargin) strjoin(varargin, ''), dataset2, 'InputVariables', 1:8, 'ExtractCellContents', true), dataset2(:, 9)];
joined = outerjoin(dataset1, dataset2, 'Keys', 1, 'MergeKeys', true, 'Type', 'right')
edit: rereading the description, it sounds like a right outerloin, not a left one
  3 Kommentare
Guillaume
Guillaume am 16 Aug. 2019
Bearbeitet: Guillaume am 16 Aug. 2019
Yes, I make sure that dataset1 has indeed 10 columns and dataset2 has indeed 9 columns as you've described. Clearly, if matlab throws the error, this is not the case. The number of rows is unimportant for the code, so I never check that.
As I said, it is trivial to convert cell arrays into table. At this point, you can use the join functions that do all the hard work for you:
%inputs
%raw1: a MxN cell array
%raw2: a Px(N-1) cell array
%note that columns 1:N-2 are used as keys in order to add column N of raw1 as new column N in raw2
traw1 = cell2table(raw1);
traw2 = cell2table(raw2);
tmerged = outerjoin(traw1, traw2, 'Keys', 1:size(raw1, 2)-2, 'MergeKeys', true, 'Type', 'right');
%if result is desired as a cell array
merged = table2cell(merged);
The same can be achieved the old fashioned way, with ismember indeed. However, you can't use ismember will 2D cell arrays of char vectors, so you'd have to merge each row of text as you initially requested, or assign a unique numeric id to each char vector. I'm choosing the latter option here:
%generation of numeric id for each cell array
uniquetext = unique([raw1(:, 1:end-2); raw2(:, 1:end-1)]);
[~, raw1key] = ismember(raw1, uniquetext);
[~, raw2key] = ismember(raw2, uniquetext);
%now we can use ismember to find which keys are present in both sets
[found, where] = ismember(raw2key, raw1key, 'rows');
merged = [raw2, num2cell(NaN(size(raw2, 1), 1))]; %prepare destination by adding a column of NaN
merged(found, end) = raw2(where(found), end); %and copy relevant elements
Guillaume
Guillaume am 19 Aug. 2019
Bearbeitet: Guillaume am 19 Aug. 2019
I tried Guillaume's method and its not working either because, the unique function he uses in his example isn't working for my datasets
"doesn't work" is a useless statement if you don't provide more details. What happens? Or doesn't happen? What if the full error message if there is one?
We're guessing what your inputs are and keep guessing wrong as you don't give us the full information. A simple way to resolve that and get an answer that works for you is to attach a mat file with example inputs.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Matrices and Arrays finden Sie in Help Center und File Exchange

Tags

Produkte


Version

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by