Plotting repeating values in excel files vs the files that they overlap in

1 Ansicht (letzte 30 Tage)
I am trying to plot the number of times that values repeat between excel files. This code will be dealing with 200+ excel files per run that are formatted the same way.
For example, between file names "output3" and "output4", I want to see how many times a value repeats in excel columns 2 through 4. If there would be 5 values that overlap between these two excel files then on the plot, there would be a bar with a value of 5 and a label saying "between output3 and output4". I have attached my code attempt below with comments as to what I am trying to do in each section:
%% beginning of code imports excel files and the column values as variables
%wanting to find where the values overlap in columns 2 - 4
L = length(fieldnames(data));
data.idx2 = cell(L-1);
data.idx3 = data.idx2;
data.idx4 = data.idx2;
for j = 2:L
for i = 1:L-1
firstoutput = data.(sprintf("variables_%d", i));
secondoutput = data.(sprintf("variables_%d",j));
data.idx2{i} = ismember(firstoutput(:,2),secondoutput(:,2));
data.idx3{i} = ismember(firstoutput(:,3),secondoutput(:,3));
data.idx4{i} = ismember(firstoutput(:,4),secondoutput(:,4));
end
end
%wanting to obtain only the number "1" (i.e. true) to count the number of times that the values overlap
isTrue2 = cellfun(@(x)isequal(x,1),data.idx2);
isTrue3 = cellfun(@(x)isequal(x,1),data.idx3);
isTrue4 = cellfun(@(x)isequal(x,1),data.idx4);
%wanting to create a variable with all of the output file names
names = [];
for n = 1:length(cycle)
x = sprintf("output%d",n);
names = [names;x];
end
%wanting to plot the # of overlapping/ repeating values vs which files they appear in
bar(length(idx2_true),names)
bar(length(idx3_true),names)
bar(length(idx4_true),names)

Antworten (1)

Hari
Hari am 3 Sep. 2024
Hi Anette,
I understand that you want to plot the number of overlapping values between specific columns of multiple Excel files, and display these overlaps as bars on a graph, labeled with the corresponding file names.
I assume that your Excel files are consistently named and formatted, and that you have access to all files within your MATLAB environment.
You can follow the below steps to acheive the same:
  • Load Excel Files and Extract Columns: First, read the Excel files and extract the relevant columns (2 to 4) for comparison. Ensure you store these columns in a structured format for easy access.
  • Find Overlapping Values: Compare the values in each specified column between all pairs of files to find overlaps.
overlaps = zeros(numFiles, numFiles, 3);
for i = 1:numFiles-1
for j = i+1:numFiles
for col = 1:3
% Find overlapping values
overlaps(i, j, col) = sum(ismember(data{i, col}, data{j, col}));
end
end
end
  • Prepare Data for Plotting: Gather the overlap data and prepare labels for the plot.
for i = 1:numFiles-1
for j = i+1:numFiles
for col = 1:3
if overlaps(i, j, col) > 0
labels{end+1} = sprintf('output%d-output%d col%d', i, j, col+1);
overlapCounts(end+1) = overlaps(i, j, col);
end
end
end
end
end
  • Plot the Data: Use MATLAB's bar plot to visualize the number of overlaps with appropriate labels.
  • Refine and Test: Ensure the script handles all files correctly, and test with a subset of files if necessary to verify functionality.
References:
Hope this helps!

Produkte


Version

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by