# K means Clusteing with Euclidean Distace

32 views (last 30 days)
Saeed Siddiqui on 4 Jan 2021
Answered: Rishabh Mishra on 7 Jan 2021
I have 1,000 data values and i want to do K means clustering where i have 10 centroids so it is not random starting. The equlidean distance for the data values needs to be equal or less than 0.01. Therefore i need my K means Clustering to have several iterations, on each iteration the latest centroids are used.
My current code does the first iteration, it works out the new centroids(C) and i manually work out the euclidean distance.
My question -- How do i make this repeat so that i can get more iterations (unknown amount) and carry on untill I get the euclidean distance to be equal or less than 0.01? Also is there a better way to calculate the euclidean distance for each iteration?
The data is one dimensional !!
X = importdata('data');
Centroid = importdata('Centroids');
ep = 0.01;
%C is the new centroid values
%grp is the corresponding original centroid
[grp,C] = kmeans(X,10,"Start",Centroid);
%Calculating Euclidean distance manually
%getting the corresponding values of g)%
d=Centroid(grp);
%Dist is the distance of the data value from the centroid %
dist=X-d;
%calculating the euclidean distance%
euclidean = (sum(abs(dist)))/1000

Rishabh Mishra on 7 Jan 2021
Hi,
Based on my understanding of the issue described by you. I would like to highlight a few points, as follows:
1. k-means clustering, or Lloyd’s algorithm, is an iterative, data-partitioning algorithm. No further explicit iterations are required, you may simply use the ‘kmeans’ function as it is.
2. The cluster centres (or centroids) are obtained after several iterations. The Euclidean distance of all the points within the cluster to the cluster centres are the minimum.
3. The output of ‘kmeans’ function is [idx, C, sumd, D] where D is matrix that stores Euclidean distances of all the points to cluster centres.
4. Since, you are using predefined number of cluster centres (k = 10), the cluster centres obtained are the best fit with minimized distances. However, this does not guarantee that distance between the points & their corresponding cluster centres reduced below 0.01.
5. On increasing number of cluster centres further, the distance may/may not reduce less than 0.01. As number of cluster centres reaches close to number of observation points, the Euclidean distance reaches close to 0. When, number of cluster centres = number of observation points, The Euclidean distances become 0.
Hope this helps