How do I obtain information about the background color of a lot of cells (~600,000cells) in an Excel spreadsheet ?

6 Ansichten (letzte 30 Tage)
I tried to do it on my own using the already discussed question
"How do I obtain information about the background color of a cell in an Excel spreadsheet using MATLAB 7.8 (R2009a) ?"
The code does work, but it is reeally slow.
I tried to use a parfor loop for every column, but it didn't help significantly. Is there another way of getting all at once? Thanks in advance.
%Preassign
cellcolor = cell(numofrows,numofcols);
ind_colored_cell = ~cellfun(@isempty, cellcolor);
%Ueber alle Spalten iterieren
for ind_c = 1:numofcols
%Spaltennamen bestimmen
if floor(ind_c/26) < 2
colname = char(65 + ind_c -1);
elseif (2 < floor(ind_c/26))&& (floor(ind_c/26) < 26) %alles bis Spalte ZZ abgedeckt
colname = char(65 + floor(ind_c/26)-1);
colname(2) = char(65 + ind_c -26*floor(ind_c/26) -
end
%Ueber alle Zeilen iterieren
for ind_r = 1:numofrows
if ind_get_color(ind_r,ind_c)
cellname = [colname int2str(ind_r)];
%Zellfarbe bestimmen
Range = E.Range(cellname);
Range.Interior.Color;
cellColor = Range.Interior.
cellColorBinary = dec2bin(cellColor,
colorB = bin2dec(cellColorBinary(1:8));
colorG = bin2dec(cellColorBinary(9:16));
colorR = bin2dec(cellColorBinary(17:24));
if ~(colorB == 255 && colorG == 255 && colorR == 255) %Zellfarbe nicht weiss
ind_colored_cell(ind_r,ind_c) = 1;
cellcolor(ind_r,ind_c) = {[colorB colorG colorR]};
end
end
end
end

Antworten (1)

Guillaume
Guillaume am 10 Okt. 2017
parfor is not going to help at all with activex calls. If anything, it could slow the processing down as on the excel side the multiple queries need to be queued.
At first glance, I see several inefficiencies:
  • Using "A1" style of reference for ranges, meaning you have to calculate the letter. Use the Range.Cells method to pass directly your row, column coordinates to Excel without any processing required.
  • dec2bin followed by bin2dec. Conversions to strings and back are always slow. Computer are really good at manipulating bits of numbers (after all that's what they talk) so what's the point in converting the bits to characters. This should be faster:
RGB = floor(mod(cellColor ./ [65536, 256, 1], 256));
  • Storing the colours in a cell array. I'd store them in a 3D matrix:
cellcolor = zeros(numofrows, numofcolumns, 3);
%...
cellcolor(ind_r, ind_c, :) = floor(mod(cellColor ./ [65536, 256, 1], 256));
  2 Kommentare
Martin Flaskamp
Martin Flaskamp am 11 Okt. 2017
This did actually help, thanks. However not by really much.
I had another function (progressbar) to keep track of the progress, which slowed down the function a big deal. 90,000 cells take about 5 minutes now.
This works for me. Thanks.
function [ ind_c, cellcolor ] = get_excel_data_v2( filepath, ind_get_color )
%Diese Funktion erhält den Pfad ein Exceldatei, liest diese ein und gibt
%zusätzlich zu den Daten auch die Farben der Felder zurück
%Tabelle einlesen
[num,~,~] = xlsread(filepath);
%Groesse bestimmen
[numofrows, numofcolumns] = size(num);
%ActiveX Server starten, Excel öffnen
E = actxserver('Excel.Application');
%Pfad der Excel datei
myExcelSheet = 'C:\...\excelsheet.xls';
E.Workbooks.Open(myExcelSheet);
%Preassign
cellcolor = zeros(numofrows, numofcolumns, 3);
ind_colored_cell = false(numofrows, numofcolumns);
%Ueber alle Spalten iterieren
for ind_c = 1:numofcolumns
%Spaltennamen bestimmen
if floor(ind_c/26) < 2
colname = char(65 + ind_c -1);
elseif (2 < floor(ind_c/26))&& (floor(ind_c/26) < 26) %alles bis Spalte ZZ abgedeckt
colname = char(65 + floor(ind_c/26)-1);
colname(2) = char(65 + ind_c -26*floor(ind_c/26) -1);
else
error('Error in get_excel_color: Nur Spalten bis ZZ/676 abgedeckt.');
end
%Ueber alle Zeilen iterieren
for ind_r = 1:numofrows
if ind_get_color(ind_r,ind_c)
cellname = [colname int2str(ind_r)];
%Zellfarbe bestimmen
Range = E.Range(cellname);
Range.Interior.Color;
cellColor = Range.Interior.Color;
if cellColor ~= 16777215 %Zellfarbe nicht weiss
ind_colored_cell(ind_r,ind_c) = 1;
cellcolor(ind_r, ind_c, :) = floor(mod(cellColor ./ [65536, 256, 1], 256));
end
end
end
end
Guillaume
Guillaume am 11 Okt. 2017
Ultimately, what really slows you down is the 600,000 individual calls to get each cell colour. Each call has to be marshalled from matlab to excel and the result has to be marshalled back using interprocess communication, and that's always going to be slow.
I've been trying to find a way to get the colours all in one call, but I don't think it's possible so I'm afraid you're going to be stuck with a slow process.
A marginal gain of speed could be had if you stored the cell colour as is in a 2d array and did the conversion to RGB (3d) in just one operation at the end.
Note: to demonstrate that the bottleneck is simply those 600,000 calls to Interior.Color, try:
excel = actxserver('Excel.Application');
excel.Visible = true;
wb = excel.Workbooks.Add;
ws = wb.Worksheet.Item(1);
rg = ws.Range('A1:Y2400'); %range with 60,0000 cells
tic;
for celliter = 1:rg.Count
rg.Item(celliter).Interior.Color; %retrieve interior colour
end
toc;
excel.Quit;
The above asks for the whole range at once, then iterate over each cell of the range (with the Item property, this should be much faster than your approach of computing each cell address) asking for the interior colour with no processing whatsoever. For only 60,000 cells, it takes 91 seconds on my machine to query the cell colours.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Data Import from MATLAB finden Sie in Help Center und File Exchange

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by