How to subset in matrix based on the first 3 columns?
45 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Clarisha Nijman
am 1 Nov. 2018
Kommentiert: Clarisha Nijman
am 3 Nov. 2018
Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities. For such a small thing the code I made look tremendously long and the results are not good at all! Is there a better way to do this in Matlab? Working with for loops and while loops is very difficult for me.
%given matrix
A=[ 1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
2 3 4 1 2 3;
2 3 4 2 3 4;
1 2 3 3 4 2;
1 4 3 2 3 4;
1 3 4 3 2 4;
1 4 3 1 2 3;
2 3 4 1 2 3];
%Subsets deduced from A(i,1:3)= A(i+1,1:3)= A(i+2,1:3) B should be:
This part of the code works!
1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
1 2 3 3 4 2;
2 3 4 1 2 3;
2 3 4 2 3 4;
2 3 4 1 2 3;
1 4 3 2 3 4;
1 4 3 1 2 3;
1 3 4 3 2 4;
%final result matrix C with the probability of 1 element in the subset should be:
This is my problem! How to find the correct probabilities.
size(B,1)=4
1 2 3 2 3 4 2/4;
1 2 3 3 2 4 ¼;
1 2 3 3 4 2 ¼ ;
size(B,1)=2
2 3 4 1 2 3 ½ ;
2 3 4 2 3 4 ½ ;
size(B,1)=2
1 4 3 2 3 4 ½ ;
1 4 3 1 2 3 ½ ;
size(B,1)=1
1 3 4 3 2 4 1;
The code:
%add column to matrix for indicator variable
indicator=zeros(size( A,1),1);
A=[A indicator];
for i=1:size(A,1)
if A(i,size(A,2))==0 %consider only not adjusted indicators
k=0;
while i+k<=size(A,1)%takes care that index is not exceeded
if A(i,1:3)==A(i+k,1:3)
A(i+k,size(A,2))=i;%indicator variable
end
k=k+1;
end
end
end
%add column to matrix for frequency in the subset
freq=zeros(size( A,1),1);
A=[A freq];
%start subsetting and compute the pdf
j=1;
while j<=max(A(:,size(A,2)-1))
B=A(A(:,size(A,2)-1)==j,:);%save the j-th subset in B
for i=1:size(B,1)
if B(i,size(B,2))==0 %consider only not adjusted indicators
k=0;
while i+k<=size(B,1)%takes care that index is not exceeded
if B(i,1:6)==B(i+k,1:6)
B(i+k,size(B,2))=i;%indicator variable
B
%subsetting to find frequencies
for v=1:max(B(:,size(B,2)))
C=B(B(:,size(B,2))==v,:);%save the j-th subset in B
%computing probability of each element in subset
for w=1:size(C,1)
C(w,size(C,2))= 1/ C(w,size(C,1));
C
end
for w=1:size(C,1)
z=1;
while z+w<size(C,1)
if C(w,1:6)==C(w+z,1:6)
C(w,size(C,2))=C(w,size(C,2))+C(w+z,size(C,2));
C(w+z,size(C,2))=0;
end
z=z+1;
end
%remove lines with probability zero
% Specify conditions, which rows should be
% removed
weg = C(:,size(C,2))==0;
% remove
C(weg,:) = [];
E=[E;C];
end
end
end
k=k+1;
end
end
end
j=j+1;
end
3 Kommentare
JohnGalt
am 1 Nov. 2018
agreed with Bruno... "Hello, I am trying to find subsets/matrices in matrix A, based on the first 3 columns, and then computing probabilities" - find sub-matrices of what form? - computing probabilities of what?
Guillaume
am 1 Nov. 2018
My understanding is that all rows with identical columns 1 to 3 belong to a subset. The probability of a row is the number of times it appear in the matrix divided by the number of rows in the subset it belongs to.
I too have not tried to understand the code.
Akzeptierte Antwort
Guillaume
am 1 Nov. 2018
Bearbeitet: Guillaume
am 2 Nov. 2018
If I understood correctly:
A=[ 1 2 3 2 3 4;
1 2 3 3 2 4;
1 2 3 2 3 4;
2 3 4 1 2 3;
2 3 4 2 3 4;
1 2 3 3 4 2;
1 4 3 2 3 4;
1 3 4 3 2 4;
1 4 3 1 2 3;
2 3 4 1 2 3];
[~, ~, uid] = unique(A, 'rows'); %get unique id for each row of A
count = accumarray(uid, 1); %get count of how many times each unique row of A appear
count = count(uid); %and assign to each row
[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to
subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset
subsetcount = subsetcount(subset); %and assign to each row
probability = count ./ subsetcount; %calculate the probability of each row in its subset
%for pretty display
table(A, subset, probability)
I'm using accumarray to compute histograms, you could replace each instance of accumarray(x, 1) by histcounts(x, 'BinMethod', 'integers')' if it's clearer for you.
4 Kommentare
Guillaume
am 2 Nov. 2018
You'll notice I used meaningful names in my answer. I have no idea what D, E, F are in your code. Code whose variables have meaningful names is instantly easier to understand.
Note that the sort in unique(sort(x)) is pointless. unique does a sort anyway, unless you use the 'stable' option.
If you don't want the repeted rows in each subset, one method:
[rows, urow, uid] = unique(A, 'rows'); %get unique rows, where they come from, and unique id for each
count = accumarray(uid, 1); %histogram of rows, matches the rows variable
[~, ~, subset] = unique(A(:, 1:3), 'rows'); %identify which subset each row belongs to
subsetcount = accumarray(subset, 1); %count the number of rows in each unique subset
subsetcount = subsetcount(subset); %and assign to each row
probability = count ./ subsetcount(urow);
%for pretty display
subset = subset(urow);
table(rows, subset, probability)
Weitere Antworten (0)
Siehe auch
Kategorien
Mehr zu Creating and Concatenating Matrices finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!