Kmeans clustering in k=10

I have a matrix with (256*1707) and I want to cluster it with Kmeans with k=10, and plot it..?
I appreciate any help you can provide.

Antworten (1)

njj1
njj1 am 18 Apr. 2018
Bearbeitet: njj1 am 18 Apr. 2018

0 Stimmen

1) Randomly initialize 10 cluster centroids. This can be done by simply randomly selecting 10 points from your dataset.

2) Compute the distance (Euclidean, presumably) from each data point to these 10 centroids.

3) Assign cluster membership of each point to the cluster who's centroid is the closest.

4) Re-compute centroid of each cluster

5) Compute distance from each data point to the 10 centroids.

6) So on...

Plotting:

for i=1:10
     plot(matrix(cluster==i,dim1),matrix(cluster==i,dim2),'o')
     hold on
end

In this plot, you have to choose two dimensions to plot against each other. From the looks of it, you have either 256 or 1707 dimensions (aka features).

17 Kommentare

Ali Ali
Ali Ali am 18 Apr. 2018
I did this code, but it seems that something is missing.. how do I label and continue to 10? is it by iteration or one by one?
opts = statset('Display','final');
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
%%Plotting
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'o')
hold on
end
plot(C(:,1),C(:,2),'kx','MarkerSize',8,'LineWidth',2)
hold off
title 'K-means with 10 Clusters and Centroids'
njj1
njj1 am 18 Apr. 2018
It appears that your cluster plotting code is OK. The centroids part is not quite correct. Try something like this:
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'o')
hold on
plot(C(i,1),C(i,2),'kx','MarkerSize',8)
end
hold off
title('K-means with 10 Clusters and Centroids')
The 'LineWidth' property is not necessary when you only plot points.
Ali Ali
Ali Ali am 18 Apr. 2018
Yes, but it doesn't appear like that clustering groups of dots? I appreciate if you can help me with this.
njj1
njj1 am 18 Apr. 2018
Bearbeitet: njj1 am 18 Apr. 2018
OK, first, I was wrong about the 'LineWidth' property when you plotted your centroids. You can and should use this when plotting the centroids.
Second, I'm not sure what you mean by "it doesn't appear like that clustering groups of dots". Are there any dots plotting? I've just replicated this code for a simpler dataset and it seems to be working fine... Here's what my code looks like right now. Bear in mind that data matrix, X, should be laid out as an n x p matrix, where n is the number of observations and p is the number of dimensions/features.
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','replicates',12);
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'o')
hold on
plot(C(i,1),C(i,2),'kx','markersize',8,'linewidth',2)
end
Ali Ali
Ali Ali am 18 Apr. 2018
Bearbeitet: Ali Ali am 18 Apr. 2018
I meant like pic. attached.
njj1
njj1 am 18 Apr. 2018
If you would like a plot like this, then there are a few changes we can make.
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'.','MarkerSize',9)
hold on
plot(C(i,1),C(i,2),'kx','markersize',8,'linewidth',2)
end
You can change the property 'MarkerSize' in the first plot() call if you want the dots to be larger.
However, judging from the plot you attached, there are only 5 clusters... Is this what you want or is this wrong?
Ali Ali
Ali Ali am 18 Apr. 2018
Don't care about the pic. it just for getting the idea, still not convinced with the result.. please, see what I got.
and why every 'RUN' the centroids are changing? the values are fixed in the matrix..!
njj1
njj1 am 18 Apr. 2018
Unfortunately, your data is quite high dimensional, which means that picking out any 2 dimensions for plotting is very likely going to produce an odd looking plot.
K-means is an algorithm that based upon an optimization routine and this optimization results in a local, not global, optimum. Further, each of your 'replicates' starts the centroids at different randomly selected location. The introduction of these varying initial conditions in conjunction with the stochastic nature of the optimization algorithm can result in the centroids changing location each time you run k-means.
A further difficulty comes from the high-dimensionality of your data. Look up "curse of dimensionality" to get some understanding of why working in high dimensions can be tricky.
njj1
njj1 am 18 Apr. 2018
I also encourage you to make sure that you are using the correct parts of your data matrix. You said your matrix was 256 x 1707. Are the rows the observations or the colulmns? My guess is that the you have 1707 observations, each of which has 256 dimensions/features. If so you need to input the transpose of your matrix into kmeans, e.g.,
[idx,C] = kmeans(X',10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
Ali Ali
Ali Ali am 18 Apr. 2018
Yes, got it.. I really appreciate your help.
Ali Ali
Ali Ali am 18 Apr. 2018
Yes, it is 256 x 1707, and when making transpose for the input, I got;
Index exceeds matrix dimensions.
Error in main_03 (line 13)
plot(X(idx==i,1),X(idx==i,2),'.','MarkerSize',9)
njj1
njj1 am 18 Apr. 2018
Bearbeitet: njj1 am 18 Apr. 2018
This is because you used the transpose when computing the clusters, so the vector idx is length 1707, not 256. It might be easier to just enter X=X' before you do any other operations. Try this:
opts = statset('Display','final');
X = X';
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
%Plotting
for i=1:10
plot(X(idx==i,1),X(idx==i,2),'.','MarkerSize',9)
hold on
plot(C(i,1),C(i,2),'kx','markersize',8,'linewidth',2)
end
hold off
title('K-means with 10 Clusters and Centroids')
Ali Ali
Ali Ali am 18 Apr. 2018
I don't know, seems to me there is a problem even with this..!!
njj1
njj1 am 18 Apr. 2018
Try plotting with other dimensions. Like I said, having a dimensionality of 256 is high and may lead to odd results for a few reasons.
To plot in other dimensions try something like this:
opts = statset('Display','final');
X = X';
[idx,C] = kmeans(X,10,'Distance','sqeuclidean','Replicates',12,'Options',opts);
%Plotting
dim1 = 1; %x-axis in your plot
dim2 = 12; %y-axis in your plot
for i=1:10
plot(X(idx==i,dim1),X(idx==i,dim2),'.','MarkerSize',9)
hold on
plot(C(i,dim1),C(i,dim2),'kx','markersize',8,'linewidth',2)
end
hold off
title('K-means with 10 Clusters and Centroids')
Ali Ali
Ali Ali am 18 Apr. 2018
Maybe it will work, but this is an image and converted to a matrix, and I have to plot all of its pixels.
Image Analyst
Image Analyst am 19 Apr. 2018
Ali, attach your data in a .mat file if you want more help, to make it easier for people to help you.
Also, you've marked it solved/accepted, so are you all done with this question?
Ali Ali
Ali Ali am 21 Apr. 2018
Hi,
this is my input.

Melden Sie sich an, um zu kommentieren.

Gefragt:

am 18 Apr. 2018

Kommentiert:

am 21 Apr. 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by