Trying to average values from specific cells in a similarity matrix
Ältere Kommentare anzeigen
I have a group of 10 vectors that represent 10 unique items I've compared to each other to assess their similarity in relation to each other. That is, they've been assigned into categories if their similarity exceeds a threshold. What I have from this process is an upper triangle similarity matrix that looks something like this where the top row and left column are the names of the categories:
10 20 20 20 20 7 7 7 7 12
10 NaN 0 0 0 0 51.3 50.5 50.4 50.5 76.5
20 NaN NaN 99.7 99.6 99.3 85.3 86.0 85.9 85.9 0
20 NaN NaN NaN 99.5 99.3 85.2 85.8 85.8 85.8 0
20 NaN NaN NaN NaN 99.5 85.4 86.0 86.0 86.0 0
20 NaN NaN NaN NaN NaN 85.3 85.9 85.9 85.9 0
7 NaN NaN NaN NaN NaN NaN 99.2 99.0 99.2 0
7 NaN NaN NaN NaN NaN NaN NaN 99.8 99.7 0
7 NaN NaN NaN NaN NaN NaN NaN NaN 99.7 0
7 NaN NaN NaN NaN NaN NaN NaN NaN NaN 0
12 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
For my next step, what I want to do is find the average similarity for items that have been placed into a category together as compared to their similarity with items that do not share their category. That is, I want to average the similarity of the Cat20s (99.7, 99.6, 99.3, 99.5, and 99.5) and the Cat7s (99.2, 99.0, 99.2, 99.8, 99.7, and 99.7) so that I can compare it to the similarity values of out-of-category items (0, 0, 0, 0, 51.3, 50.4, 50.5, 76.5, 85.3, 86.0, 85.9, 85.9, 0, etc). What I'm trying to do is assess the effectiveness of the categorization scheme.
I have tried to think through this, but I can't find an approach that I think will work. (I'm pretty new at this, so maybe there is something obvious I haven't thought of.)
Many thanks in advance!
Akzeptierte Antwort
Weitere Antworten (1)
Not too bad ... use logical addressing to find the locations and the mean with the 'omitnan' argument over the values returned...
Generically, you can write something like (augment the array with a NaN in 1,1 position or build the CATS array independently as here depending on how you have the data originally--
CATS=[10 20 20 20 20 7 7 7 7 12].'; % the categories in respective position in array
C=unique(CATS); % the unique categories over which to iterate
%A=A(2:end,2:end); % or A if you don't include the extraneous row/column to begin with
M=zeros(size(C)); % how many means there are possible -- one/category
for i=1:numel(M)
ixcat=(CATS==C(i)); % get the index into the array column/row -- same since symmetric
M(i)=mean(A(logical(ixcat.*ixcat.'),'all','omitnan')); % expand vector to logical array, select, compute
end
results in
>> disp([C M])
7.0000 99.4333
10.0000 NaN
12.0000 NaN
20.0000 99.4833
>>
In this case only the two categories have any finite elements, but the above will work in general regardless the size or number rows/columns per category. You can always retain only finite results in the end.
9 Kommentare
Wendi Fellner
am 5 Sep. 2022
Wendi Fellner
am 7 Sep. 2022
Compare the output of the expression
logical(ixcat.*ixcat.')
to the array and you'll see it is precisely the selection that is the intersection of the same values in both directions -- the only presumption is the categories are the same in both directions since only the one vector is used for both directions. The selection is NOT the whole row/column; it's the product and is a square logical addressing array the size of the array with TRUE elements at the specific interesection.
ADDENDUM
Oh. I don't recall when the automatic array expansion was introduced -- the above is the same as matrix multiplication to return a matrix product with recent releases of MATLAB. You MAY need to write the above as
logical(ixcat*ixcat.')
instead to get the matrix multiplication in earlier releases.
I don't know when the 'all' syntax was introduced; the early MATLAB idiom would be (:) which returns the whole array as a vector and serves thus the same purpose as 'all'. To apply the colon reference, however, requires having a temporary variable; MATLAB doesn't support the syntax to dereference a function return. So, another idiom one will often see, particularly in older code, is the somewhat peculiar-looking
mean(mean(x))
which serves the same purpose since mean is vectorized to return column means from a 2D array, the first call returns a vector; the second then averages the elements of the columns for the overall array average. The above is for 2D array, one has to continue to add terms as the dimensionality of the array increases, of course, which is why the alternate syntax was introduced.
However, if the 'all' syntax isn't supported, the 'omitnan' argument may not be either -- I don't recall (and am too lazy to go back thru the release notes to look it up) if they were itnroduced at the same time or not. If this is an issue, then there's a (now deprecated) family of special-purpose functions nanXXX for the various statistics where XXX is mean, std, var, min, max, ... that older release can still use.
All these little warts and improvements and that R2016 is now pretty old (as releases go) makes me suggest you should look into seeing if you could update your version to something closer to current.
Wendi Fellner
am 10 Sep. 2022
Bearbeitet: dpb
am 10 Sep. 2022
dpb
am 10 Sep. 2022
The code will work as written given the assumptions made...can't see anything to do about anything with the data to go with it, though.
It's all dependent upon the CATS array matching up to the data array sizes, though -- if they're consistent there can't be an array index out of bounds because the indexing logical vector can't be longer than the size of the array. Again, of course, it also has to be square.
Wendi Fellner
am 10 Sep. 2022
Wendi Fellner
am 10 Sep. 2022
dpb
am 10 Sep. 2022
>> sim_matrix_wf
Error using load
Unable to read file 'ARTwarp095_0.mat'. No such file or directory.
Error in sim_matrix_wf (line 6)
load ARTwarp095_0.mat; %load the .mat file that was generated in the ARTwarp run
>>
So, no...but it also very belligerently clear'ed my workspace....that was rude!
>> whos -file s_matrix.mat
Name Size Bytes Class Attributes
s_matrix 80x80 51200 double
>>
Clearly from the above your CATS array must be wrong -- the data array is 80x80 but you're generating a reference to position 81. Ergo, it must be one element too long to match.
Wendi Fellner
am 10 Sep. 2022
Kategorien
Mehr zu Region and Image Properties finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!