Find Optimal Number of Cluster using Silhoutte Criterion from Scratch In MATLAB
2 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Hammad Younas
am 15 Feb. 2023
Kommentiert: Gian23
am 16 Feb. 2023
ello, I Hope you are doing well. I am trying to Find optimal Number of Cluster using evalclusters with K-means and silhouette Criterion
The build in Command takes very large time to find optimal Cluster. I am implementing this method from scratch. I have the following code. The score obtained by scratch algorithm is different from build in Function
The Dataset and the build-in function in the following section. The evaluation.CriterionValues are the scores for optimal K
x =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];
y =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];
X = [x.' y.'];
dataset_len = size(X,1);
num_kmeans = 6;
%%
evaluation = evalclusters(X,"kmeans","silhouette","KList",1:num_kmeans)
evaluation.CriterionValues
Here is the Code to implement this from scratch. The array_silhoutte are the scores for optimal K
array_silhoutte = zeros(1,num_kmeans);
distance_a = [];
distance_b = [];
for j=1:num_kmeans
[cluster_assignments,centroids] = kmeans(X,j,'Distance','sqeuclidean','Start','sample');
%[~,grps_11]=grp2idx(cluster_assignments);
for i = 1:dataset_len
distance_a = [];
distance_b = [];
current_datapoint = X(i,:);
for k=1:dataset_len
if i~=k
if (cluster_assignments(i)== cluster_assignments(k))
dist = pdist2( current_datapoint,X(k,:),'squaredeuclidean') ;
distance_a = [distance_a;dist];
else
dist = pdist2( current_datapoint,X(k,:),'squaredeuclidean') ;
distance_b=[distance_b;dist];
end
end
end
Average_a=mean(distance_a);
Average_b=mean(distance_b);
end
array_silhoutte(j) = (Average_b-Average_a)./max(Average_b, Average_a);
end
Can anybody help me with this to equal the score for scratch and build-in-function
Akzeptierte Antwort
Marco Riani
am 16 Feb. 2023
Bearbeitet: Marco Riani
am 16 Feb. 2023
x =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];
y =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];
X = [x.' y.'];
dataset_len = size(X,1);
num_kmeans = 6;
evaluation = evalclusters(X,"kmeans","silhouette","KList",1:num_kmeans)
disp("Criterion values from evalclusters")
disp(evaluation.CriterionValues)
array_silhoutte = zeros(1,num_kmeans);
for j=1:num_kmeans
% [cluster_assignments,centroids] = kmeans(X,j,'Distance','sqeuclidean','Start','sample');
[cluster_assignments,centroids] = kmeans(X,j,'Replicates',100);
avgDWithin=zeros(dataset_len,1);
avgDBetween=Inf(dataset_len,j);
for i=1:dataset_len
for jj=1:j
boo=cluster_assignments==cluster_assignments(i);
Xsamecluster=X(boo,:);
if size(Xsamecluster,1)>1
avgDWithin(i)=sum(sum((X(i,:)-Xsamecluster).^2,2))/(size(Xsamecluster,1)-1);
end
boo1= cluster_assignments~=cluster_assignments(i);
Xdifferentcluster=X(boo1 & cluster_assignments ==jj,:);
if ~isempty(Xdifferentcluster)
avgDBetween(i,jj)=mean(sum((X(i,:)-Xdifferentcluster).^2,2));
end
end
end
% Calculate the silhouette values
minavgDBetween = min(avgDBetween, [], 2);
silh = (minavgDBetween - avgDWithin) ./ max(avgDWithin,minavgDBetween);
array_silhoutte(j) =mean(silh);
end
disp("Criterion values computed manually")
disp(array_silhoutte)
I slighly rewrote your code and put Replicates',100 in the call to kmeans. Please let me know if now everything is clear. Of course kmeans does not take into account the correlation among the variables and it is not robust to the presence of atypical observations. Anyway, this is another story.
Best
Marco
Weitere Antworten (0)
Siehe auch
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!