How to add numbers (unevaluated), calculated by analysing a tall table, to a pre-allocated matrix?

2 Ansichten (letzte 30 Tage)
In a column of a tall table I have a set number of different strings. I'd like to count the number of times each of these strings occur in the column and calculate the percentage of each string's occurence. These numbers I'd like to save inside a pre-allocated matrix with which I'd like to create a bar plot. All this I'd like to do without having to gather the full tall column beforehand.
I'm using R2016b.
Here's an example code of what I'd like to accomplish:
%Segments is the list of different strings
%Cars is a tall cell column of a table, containing the data / strings
Cars = gather(DB.CarSegment); %This step I'd like to omit
NrCarsPerSeg = zeros(2,size(Segments,2));
%This matrix stores the number of occurences and the percentages.
for seg = 1:size(Segments,2)
NrCarsPerSeg(1,seg) = sum(strcmp(Cars, Segments{1,seg}));
end
% Percentage:
NrCarsPerSeg(2,:) = (NrCarsPerSeg(1,:) / sum(NrCarsPerSeg(1,:))) * 100;
barPlot = bar(diag(NrCarsPerSeg(2,:)), 'stacked');
The upper code so far only works when Cars has been gathered and is thus not a tall table anymore. However, as the data gains in size, this might not always be feasible and that's why I'd rather have Cars stay an unevaluated tall table.
-----------------------------------------------
The following I've tried:
  • Without gathering just like that it gives me this error:
The following error occurred converting from tall to double:
Conversion to double from tall is not possible.
  • Declare NrCarPerSeg a tall matrix
NrCarsPerSeg = tall(zeros(2,size(Segments,2)));
inside the for loop this gives me the following error:
For A(m,n,...) = B, m must be either a colon (:) or a tall logical vector.
To circumvent that error I created a tall index vector:
idx = tall(logical([1 0])');
This gives me the following error inside the for loop:
In the assignment A(m,n,...) = B, B must be a scalar value.
The result of 'sum(strcmp(Cars, Segments{1,seg}))' is a 'tall double (unevaluated)' while NrCarsPerSeg(idx,seg) is a "evaluated" 'tall double'. This is probably the crux of the problem. Is there a way to solve this?
Thanks a lot for reading!

Antworten (2)

Steven Lord
Steven Lord am 13 Sep. 2017
Have you considered trying to make a tall histogram, perhaps combined with a preprocessing categorical call to group the text data in your tall table into categories and make them a categorical array?
  1 Kommentar
Benjamin Imbach
Benjamin Imbach am 14 Sep. 2017
Bearbeitet: Benjamin Imbach am 14 Sep. 2017
Thanks for the suggestion! For this specific case categorical arrays definitely seem to be way to go and make things much simpler. However, what if I want to create a histogram / bar plot not from a sum / prevalence but a mean value? For example from CO2PerSeg calculated like this:
for seg = 1:size(Segments,2)
CO2PerSeg(1,seg) = mean(CO2_Werte(strcmp(Cars, Segments{1,seg})));
end
So far I came up with another pretty hacked together way to do it. I've added it as a new answer.

Melden Sie sich an, um zu kommentieren.


Benjamin Imbach
Benjamin Imbach am 14 Sep. 2017
So far I came up with another pretty hacked together way to do it (programmers will hate me!). Albeit it is ugly it works pretty well.
It uses the eval function to generate variables (yikes!). That way the new variable types are automatically 'unevaluated tall double' and thus there is no conflict anymore. All the generated variables are put into an array that is then also automatically a 'unevaluated tall array'. This array can then be easily used for a histogram or bar plot. This way only the results have to be gathered which is much more efficient.
for seg = 1:size(Segments,2)
eval(['NrCarsPerSeg_' num2str(seg) ' = sum(strcmp(Cars, Segments{1,seg}));']);
end
Variables = char(join((who('NrCarsPerSeg_*')'),' '));
eval(['NrCarsPerSeg = [' Variables '];'])
% Percentage:
NrCarsPerSeg_Perc = (NrCarsPerSeg ./ sum(NrCarsPerSeg)) * 100;
[NrCarsPerSeg, NrCarsPerSeg_Perc] = gather(NrCarsPerSeg, NrCarsPerSeg_Perc);
barPlot = bar(diag(NrCarsPerSeg_Perc), 'stacked');

Kategorien

Mehr zu Tables finden Sie in Help Center und File Exchange

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by