MATLAB Answers

# Parsing the data out based on categories encoded in a string

1 view (last 30 days)
Madina Makhmutova on 18 Aug 2020
Commented: dpb on 20 Aug 2020
task #1: I have an excel table 'Results' that contains 'Area' values and very complex 'Label' names. I'm trying to parse the data out based on 3 different categories encoded in the Label: 1) animal number (mo#), 2) series number (Series#), 3) Color or Whole image (encoded in the last word)
task #2: due to the nature of my data, I need to 'zero-out' values that == whole image area.
task #3: Values for the repeating series numbers need to be summed up
At the end I want to get a separate table for each animal where each row will be a single summed up series and each column will be an individual color.
I started backwards, parsing out colors and zeroing them out first. I got stuck at the part where I need to add up the matching series.
I know that my code is very primitive and there must be more elegant ways of doin it. Any help will be gratly appreciated.
% Parse out results table into single colors
clc
clear
T = readtable("Vgx_Results.xlsx");
Cyan = T(1:8:end,:);
Blue = T(2:8:end,:);
Green = T(3:8:end,:);
Red = T(4:8:end,:);
Yellow = T(5:8:end,:);
Magenta = T(6:8:end,:);
Grey = T(7:8:end,:);
Whole = T(8:8:end,:);
data = [Cyan.Area, Blue.Area, Green.Area, Red.Area, Yellow.Area, Magenta.Area, Grey.Area];
% if the area for the given color ==0, imageJ returns the value of the
% whole image area in the result table
% Zero-out all areas that equal to the whole image area
islet_num = size(Cyan,1);
for r = 1:islet_num
for c = 1:7
if data(r,c) == Whole.Area(r)
new_data(r,c) = 0;
else new_data(r,c) = data(r,c);
end
end
end
% Some series were separated into several pieces. Below script adds up all
% areas values for the same series.
Label = string(Cyan.Label);
new_Label = extractBefore((extractAfter(Label,'Series')),'.tif-');
for r = 1:(size(data,1)-1)
if new_Label(r) == new_Label((r+1))
data(r,:) = data(r,:) + data((r+1),:);
data(r+1,:) = [];
continue
end
end
vars = {'Cyan','Blue','Green','Red','Yellow','Magenta','Grey'};
data = array2table(new_data,"VariableNames",vars);

#### 0 Comments

Sign in to comment.

### Accepted Answer

dpb on 19 Aug 2020
Well can get you mostly there although not sure exactly what you envision the final result to be from your description...
tRes=readtable('Results.xlsx'); % read the raw data table...
tData=table(str2double(extractBetween(tRes.Label,'Islet mo','_')), ...
str2double(extractBetween(tRes.Label,'Series','.')), ...
extractAfter(tRes.Label,'composite.tif '), ...
tRes.Area, ...
'variablenames',{'AnimalID','Series','Color','Area'}); % and parse/separate the fields
tData.Color=categorical(tData.Color); % convert colors to categorical
tData=tData(tData.Color~='Whole image',:); % get rid of "whole image"
This gets a usable table that looks like--
>> head(tData)
ans =
8×4 table
AnimalID Series Color Area
________ ______ _______ ________
923.00 3.00 Cyan 25283.01
923.00 3.00 Blue 46.23
923.00 3.00 Green 84.99
923.00 3.00 Red 0.05
923.00 3.00 Yellow 25283.01
923.00 3.00 Magenta 25283.01
923.00 3.00 Grey 25283.01
923.00 5.00 Cyan 4551.54
>>
Now to the combination of areas by group...
>> tB=varfun(@sum,tData,'InputVariables',{'Area'},'GroupingVariables',{'AnimalID','Series','Color'});
>> head(tB)
ans =
8×5 table
AnimalID Series Color GroupCount sum_Area
________ ______ _______ __________ ________
923.00 3.00 Blue 1.00 46.23
923.00 3.00 Cyan 1.00 25283.01
923.00 3.00 Green 1.00 84.99
923.00 3.00 Grey 1.00 25283.01
923.00 3.00 Magenta 1.00 25283.01
923.00 3.00 Red 1.00 0.05
923.00 3.00 Yellow 1.00 25283.01
923.00 5.00 Blue 1.00 15.65
>> tail(tB)
ans =
8×5 table
AnimalID Series Color GroupCount sum_Area
________ ______ _______ __________ ________
944.00 12.00 Yellow 3.00 3418.61
944.00 15.00 Blue 2.00 61.66
944.00 15.00 Cyan 2.00 3.45
944.00 15.00 Green 2.00 240.20
944.00 15.00 Grey 2.00 1.36
944.00 15.00 Magenta 2.00 0.89
944.00 15.00 Red 2.00 110.18
944.00 15.00 Yellow 2.00 68.25
>>
>> height(tB)
ans =
203.00
>>
200 elements too much to attach, so that's the head and tail--the max group count was 4
I don't know how you would want to combine the AnimalID/Series with the colors on a single line -- what happens to the Area totals?

#### 6 Comments

Show 3 older comments
dpb on 19 Aug 2020
I'm not convinced your code is correct, however...if there are rows of Cyan data in the original file that are sequentially located, then the 1:8:end counting you used to select the data by color is going to be off from the first one of those on.
I think you need to double-check your result is what you think it is--or the code above and the data files you've posted don't match in some magic way if the two rows in sequence are both a given color. It just "does not compute" as written above.
Madina Makhmutova on 19 Aug 2020
You are right, for the table in my first comment I used the different Result.xlsx .
I checked, the code 1:8:end seems to be working correctly, but I agree with you, It is definitely better to parse out colors based on the string instead of 1:8:end. I just didn't know how to do it correctly in the beginning.
If you don't mind helping me out once again, how would you go from a table like this :
AnimalID Series Color GroupCount sum_Area
________ ______ _______ __________ ________
923.00 3.00 Blue 1.00 46.23
923.00 3.00 Cyan 1.00 25283.01
923.00 3.00 Green 1.00 84.99
923.00 3.00 Grey 1.00 25283.01
923.00 3.00 Magenta 1.00 25283.01
923.00 3.00 Red 1.00 0.05
923.00 3.00 Yellow 1.00 25283.01
923.00 5.00 Blue 1.00 15.65
to a table like this:
Lanimal_id series_id Cyan Blue Green Red Yellow Magenta Grey
_________ _________ ______ ______ ______ ______ ______ _______ ______
923 3 25283 46.235 84.989 0.054 25283 25283 25283
923 5 4551.5 15.651 2.22 0.133 4551.5 4551.5 4551.5
923 9 7412.9 27.075 7.037 0.822 7412.9 7412.9 7412.9
923 11 0.328 264.23 108.06 254.95 22.334 57.969 5.748
923 11 0.082 77.101 46.228 162.66 10.756 3.859 3.12
923 17 0.131 2.452 4.772 2.474 0.022 2402.2 2402.2
923 17 0.7 30.886 11.908 15.301 0.919 7964.8 7964.8
923 17 2.408 53.433 22.371 8423.2 8423.2 8423.2 8423.2
dpb on 20 Aug 2020
Sorry, I didn't see this earlier...OK, just making aware that the counting by line if understood what you were doing correctly would end up with picking wrong rows if the perfect pattern of sets of eight in the specific order was ever broken...even one record off means everything past that point is off.
I'll have to mull a while -- the trivial immediate one-liner doesn't come to me otomh...is the ID/Series unique or are there multiple possible sets?
I've meeting in town here in a few minutes; will try to find some time tonight...

Sign in to comment.

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by