- k-means clustering, or Lloyd’s algorithm, is an iterative, data-partitioning algorithm. No further explicit iterations are required, you may simply use the ‘kmeans’ function as it is.
- The cluster centres (or centroids) are obtained after several iterations. The Euclidean distance of all the points within the cluster to the cluster centres are the minimum.
- The output of ‘kmeans’ function is [idx, C, sumd, D] where D is matrix that stores Euclidean distances of all the points to cluster centres.
- Since, you are using predefined number of cluster centres (k = 10), the cluster centres obtained are the best fit with minimized distances. However, this does not guarantee that distance between the points & their corresponding cluster centres reduced below 0.01.
- On increasing number of cluster centres further, the distance may/may not reduce less than 0.01. As number of cluster centres reaches close to number of observation points, the Euclidean distance reaches close to 0. When, number of cluster centres = number of observation points, The Euclidean distances become 0.
K means Clusteing with Euclidean Distace
32 views (last 30 days)
I have 1,000 data values and i want to do K means clustering where i have 10 centroids so it is not random starting. The equlidean distance for the data values needs to be equal or less than 0.01. Therefore i need my K means Clustering to have several iterations, on each iteration the latest centroids are used.
My current code does the first iteration, it works out the new centroids(C) and i manually work out the euclidean distance.
My question -- How do i make this repeat so that i can get more iterations (unknown amount) and carry on untill I get the euclidean distance to be equal or less than 0.01? Also is there a better way to calculate the euclidean distance for each iteration?
The data is one dimensional !!
%load the data%
X = importdata('data');
Centroid = importdata('Centroids');
ep = 0.01;
%C is the new centroid values
%grp is the corresponding original centroid
[grp,C] = kmeans(X,10,"Start",Centroid);
%Calculating Euclidean distance manually
%getting the corresponding values of g)%
%Dist is the distance of the data value from the centroid %
%calculating the euclidean distance%
euclidean = (sum(abs(dist)))/1000
Rishabh Mishra on 7 Jan 2021
Based on my understanding of the issue described by you. I would like to highlight a few points, as follows:
Hope this helps