Trying to average values from specific cells in a similarity matrix

Question

Wendi Fellner am 3 Sep. 2022

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/1794155-trying-to-average-values-from-specific-cells-in-a-similarity-matrix

Bearbeitet: Wendi Fellner am 10 Sep. 2022

I have a group of 10 vectors that represent 10 unique items I've compared to each other to assess their similarity in relation to each other. That is, they've been assigned into categories if their similarity exceeds a threshold. What I have from this process is an upper triangle similarity matrix that looks something like this where the top row and left column are the names of the categories:

        10     20     20      20     20       7        7       7       7    12
  NaN      0     0	      0	     0	    51.3    50.5    50.4    50.5  76.5
  NaN    NaN    99.7    99.6    99.3    85.3    86.0    85.9    85.9    0
  NaN    NaN    NaN	    99.5    99.3    85.2    85.8    85.8    85.8	0
  NaN    NaN    NaN	    NaN	    99.5    85.4    86.0    86.0    86.0    0
  NaN    NaN    NaN	    NaN	    NaN	    85.3    85.9    85.9    85.9    0
   NaN    NaN    NaN	    NaN	    NaN	    NaN	    99.2    99.0    99.2    0
   NaN    NaN    NaN	    NaN	    NaN	    NaN	    NaN	    99.8    99.7    0
   NaN    NaN    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    99.7    0
   NaN    NaN    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    0
  NaN    NaN    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    NaN	    NaN

For my next step, what I want to do is find the average similarity for items that have been placed into a category together as compared to their similarity with items that do not share their category. That is, I want to average the similarity of the Cat20s (99.7, 99.6, 99.3, 99.5, and 99.5) and the Cat7s (99.2, 99.0, 99.2, 99.8, 99.7, and 99.7) so that I can compare it to the similarity values of out-of-category items (0, 0, 0, 0, 51.3, 50.4, 50.5, 76.5, 85.3, 86.0, 85.9, 85.9, 0, etc). What I'm trying to do is assess the effectiveness of the categorization scheme.

I have tried to think through this, but I can't find an approach that I think will work. (I'm pretty new at this, so maybe there is something obvious I haven't thought of.)

Many thanks in advance!

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Wendi Fellner am 10 Sep. 2022

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1794155-trying-to-average-values-from-specific-cells-in-a-similarity-matrix#answer_1050890

Bearbeitet: Wendi Fellner am 10 Sep. 2022

In MATLAB Online öffnen

I went back to the drawing board and figured out a way to do it. :-) Here's what I came up with. (Thank you dpb for all the time and effort and patience in working on this. I may not have communicated clearly what I was trying to do.)

% Create an index for values that are within-category and another index
 % for those that are between categories
 idxwithin = zeros(size(label_matrix)); %create a matrix of zeros the size of label_matrix to hold markers for values that are within the same category
 idxbetween = zeros(size(label_matrix)); %create a matrix of zeros the size of label_matrix to hold markers for values that are NOT within the same category
 for column = 2:length(label_matrix) %loop across each column header
     for row =2:length(label_matrix) %loop down each row header
         if label_matrix(1,column) == label_matrix(row,1) %if column header = row header...
             idxwithin(row,column) = 1; %enter 1 at the intersection of row,column into the 'idxwithin' matrix
         else
             idxbetween(row,column) = 1; %otherwise enter 1 at the intersection of row,column into the 'idxbetween' matrix
         end
     end
 end
 idxwithin = logical(idxwithin); %convert idxwithin matrix into a logical
 idxbetween = logical(idxbetween); %convert idxbetween matrix into a logical
 
% find the means of within- and between-category values
withinCatMean = mean(label_matrix(idxwithin),'all','omitnan') %calculate the mean of the within category values from label_matrix, exluding NaNs
betweenCatMean = mean(label_matrix(idxbetween),'all','omitnan') %calculate the mean of the between category values from label_matrix, exluding NaNs

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Answer 2

dpb am 3 Sep. 2022

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/1794155-trying-to-average-values-from-specific-cells-in-a-similarity-matrix#answer_1041290

Bearbeitet: dpb am 5 Sep. 2022

In MATLAB Online öffnen

Not too bad ... use logical addressing to find the locations and the mean with the 'omitnan' argument over the values returned...

Generically, you can write something like (augment the array with a NaN in 1,1 position or build the CATS array independently as here depending on how you have the data originally--

CATS=[10 20 20 20 20 7 7 7 7 12].';     % the categories in respective position in array
C=unique(CATS);                         % the unique categories over which to iterate
%A=A(2:end,2:end);     % or A if you don't include the extraneous row/column to begin with
M=zeros(size(C));                       % how many means there are possible -- one/category
for i=1:numel(M)
  ixcat=(CATS==C(i));                   % get the index into the array column/row -- same since symmetric
  M(i)=mean(A(logical(ixcat.*ixcat.'),'all','omitnan'));    % expand vector to logical array, select, compute
end

results in

>> disp([C M])
    7.0000   99.4333
   10.0000       NaN
   12.0000       NaN
   20.0000   99.4833
>> 

In this case only the two categories have any finite elements, but the above will work in general regardless the size or number rows/columns per category. You can always retain only finite results in the end.

9 Kommentare
7 ältere Kommentare anzeigen7 ältere Kommentare ausblenden

dpb am 7 Sep. 2022

Bearbeitet: dpb am 7 Sep. 2022

In MATLAB Online öffnen

Compare the output of the expression

logical(ixcat.*ixcat.')

to the array and you'll see it is precisely the selection that is the intersection of the same values in both directions -- the only presumption is the categories are the same in both directions since only the one vector is used for both directions. The selection is NOT the whole row/column; it's the product and is a square logical addressing array the size of the array with TRUE elements at the specific interesection.

ADDENDUM

Oh. I don't recall when the automatic array expansion was introduced -- the above is the same as matrix multiplication to return a matrix product with recent releases of MATLAB. You MAY need to write the above as

logical(ixcat*ixcat.')

instead to get the matrix multiplication in earlier releases.

I don't know when the 'all' syntax was introduced; the early MATLAB idiom would be (:) which returns the whole array as a vector and serves thus the same purpose as 'all'. To apply the colon reference, however, requires having a temporary variable; MATLAB doesn't support the syntax to dereference a function return. So, another idiom one will often see, particularly in older code, is the somewhat peculiar-looking

mean(mean(x))

which serves the same purpose since mean is vectorized to return column means from a 2D array, the first call returns a vector; the second then averages the elements of the columns for the overall array average. The above is for 2D array, one has to continue to add terms as the dimensionality of the array increases, of course, which is why the alternate syntax was introduced.

However, if the 'all' syntax isn't supported, the 'omitnan' argument may not be either -- I don't recall (and am too lazy to go back thru the release notes to look it up) if they were itnroduced at the same time or not. If this is an issue, then there's a (now deprecated) family of special-purpose functions nanXXX for the various statistics where XXX is mean, std, var, min, max, ... that older release can still use.

All these little warts and improvements and that R2016 is now pretty old (as releases go) makes me suggest you should look into seeing if you could update your version to something closer to current.

Wendi Fellner am 10 Sep. 2022

Bearbeitet: dpb am 10 Sep. 2022

In MATLAB Online öffnen

I have tried the 2020b version and everything in the script seems to be working until it gets to the M(i) line. I've tried with ixcat.*ixcat and also ixcat*ixcat. I'll post my code below. Perhaps I've not incorporated your code correctly. 's_matrix' is the full similarity matrix where bother upper and lower triangles are included and there are no labels along the top row or left column, so the first part of my script is creating the 'label_matrix' matrix that removes the lower triangle and adds the category names. Then I use your code to try to extract and average the within-category values. (I'll also need to extract and average the between-category values at some point, but would like to solve this part first and then maybe I'll understand how to do the between-category values.) The code and then the error messages are below. Can you see where I've gone wrong?

% modify the s_matrix to remove the lower triangle and diagonal values to
% eliminate repeats
idx = ones(size(s_matrix)); %generate a matrix of ones the same size as the similarity matrix
idx = logical(triu(idx,1)); %keep only upper triangle and make into 'logical'
s_uptri_matrix = NaN(numSamples); %create a new matrix filled with NaN
s_uptri_matrix(idx) = s_matrix(idx); %create 'upper triangle' matrix with only the upper triangle values from s_matrix
% add DATA.category values to the s_uptri-matrix as row and column headers
cats = [DATA.category];
l_cats = [NaN(1); cats'];
label_matrix = [cats; s_uptri_matrix]; %add row of category numbers from ARTwarp's DATA struct
label_matrix = [l_cats, label_matrix]; %add column of category numbers transposed from ARTwarp's DATA struct
% Identify within-category values
C = unique(cats); %create vector of unique category names
M = zeros(size(C)); %create matrix of 0s that is the same size as C
for i=1:numel(M)
    ixcat = (cats == C(i)); %create an index of where the category names equal the 'for loop' counter?
    M(i) = mean(label_matrix(logical(ixcat.*ixcat.'), 'all', 'omitnan'));
end

Error when I include the period:

The logical indices in position 1 contain a true value outside of the
array bounds.
Error in sim_matrix_wf (line 42)
M(i) = mean(label_matrix(logical(ixcat.*ixcat.'), 'all',
'omitnan'));

Error when I don't include the period:

Index in position 2 exceeds array bounds (must not exceed 81).
Error in sim_matrix_wf (line 42)
M(i) = mean(label_matrix(logical(ixcat*ixcat.'), 'all', 'omitnan'));

Thanks for your help!

dpb am 10 Sep. 2022

In MATLAB Online öffnen

>> sim_matrix_wf
Error using load
Unable to read file 'ARTwarp095_0.mat'. No such file or directory.
Error in sim_matrix_wf (line 6)
    load ARTwarp095_0.mat; %load the .mat file that was generated in the ARTwarp run 
>> 

So, no...but it also very belligerently clear'ed my workspace....that was rude!

>> whos -file s_matrix.mat
  Name           Size            Bytes  Class     Attributes
  s_matrix      80x80            51200  double              
>>

Clearly from the above your CATS array must be wrong -- the data array is 80x80 but you're generating a reference to position 81. Ergo, it must be one element too long to match.

Wendi Fellner am 10 Sep. 2022

I'm sorry about that!

Melden Sie sich an, um zu kommentieren.

Trying to average values from specific cells in a similarity matrix

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (1)

9 Kommentare
7 ältere Kommentare anzeigen7 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Trying to average values from specific cells in a similarity matrix

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Akzeptierte Antwort

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Weitere Antworten (1)

9 Kommentare 7 ältere Kommentare anzeigen7 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

9 Kommentare
7 ältere Kommentare anzeigen7 ältere Kommentare ausblenden