Why Kmeans function give us give different answer?

Question

0 Stimmen

I have noticed that kmeans function for one k value in a single run gives different cluster indices than while using in a loop with varying k say from 2:N. I do not understand this. It will be great if it is clear to me.

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

José-Luis am 22 Sep. 2014

1 Stimme

Because, if you are using the default settings, kmeans() randomly selects a starting point. The algorithm is not deterministic and the results might depend on that starting position.

2 Kommentare
Keine anzeigen Keine ausblenden

Mahesh am 22 Sep. 2014

In MATLAB Online öffnen

So what is the default setting then i have chosen:

rng('default');

Am I right?

Adam Filion am 22 Sep. 2014

In MATLAB Online öffnen

Try using the 'replicates' option for kmeans to automatically run the algorithm multiple times and return the best answer:

>> doc kmeans

You can set the order of random numbers generated with the rng command:

>> doc rng

Putting something like rng(3) before kmeans will make the results repeatable even though it involves random starting points.

Melden Sie sich an, um zu kommentieren.

Answer 2

Image Analyst am 22 Sep. 2014

0 Stimmen

http://www.mathworks.com/help/stats/k-means-clustering.html

Like many other types of numerical minimizations, the solution that kmeans reaches often depends on the starting points. It is possible for kmeans to reach a local minimum, where reassigning any one point to a new cluster would increase the total sum of point-to-centroid distances, but where a better solution does exist. However, you can use the optional 'replicates' parameter to overcome that problem.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Mahesh am 22 Sep. 2014

In MATLAB Online öffnen

Yes I do understand. However, I got different answer while it is single value of cluster like

      [idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
          'replicates',8, 'display','iter');

and others inside loop like

rng('default');  % For reproducibility
param_sac = load('param2W_sac.cld');
size(param_sac);
dist_alg = 'sqEuclidean';
iditer = [];
sumdistitr = [];
meansil = [];
silhitr = [];
for nkmeans = 1:10;
    [idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
        'replicates',nkmeans, 'display','iter');
    [silh,h] = silhouette(param_sac,idx);
    xlabel('Silhouette Value')
    ylabel('Cluster');
    meanh = mean(silh);
    iditer = [iditer idx];
%     cen = [cen cent];
%     sumdistitr = [sumdistitr sumdist];
    meansil = [meansil; nkmeans meanh];
    silhitr = [silhitr silh];    
end

I got totally different in classification.

Thanks for responses to all

Melden Sie sich an, um zu kommentieren.

Why Kmeans function give us give different answer?

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare
Keine anzeigen Keine ausblenden

Weitere Antworten (1)

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Kategorien

Produkte

Tags

Community Treasure Hunt

Why Kmeans function give us give different answer?

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

2 Kommentare Keine anzeigen Keine ausblenden

Weitere Antworten (1)

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Kategorien

Produkte

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

2 Kommentare
Keine anzeigen Keine ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden