K-means Clustering

2 Ansichten (letzte 30 Tage)
Avishek Dutta
Avishek Dutta am 7 Jun. 2012
Hi, I am trying to do a un-supervised classification on IMU data.
I have a raw data file of variables like accleration(x,y,z), gyro(x,y,z) etc (in total 15 coloums).
When I apply
opts = statset('Display','final');
idx = kmeans(DataUnsup,NoOfClassesUnsup,'Distance','correlation','Replicates',NoOfIterations,'Options',opts);
[sizeFullData,~] = size(DataUnsup);
T=1:sizeFullData;
plot(T',idx,'g+');
I am NOT getting a good clustering. This is raw data which is a file containing data of all movements like walk, run, sit, stand etc recoreded together in one go. There is no information of the grouping or order of the data. DataUnsup usually consists of a mixture of the different 15 variables, if not all.
Can someone please guide me?
Avishek
P.S. sqeuclidean, cosine etc are also not working.

Antworten (2)

Walter Roberson
Walter Roberson am 7 Jun. 2012
What you write suggests that kmeans is not a good classifier for you to use.
  2 Kommentare
Avishek Dutta
Avishek Dutta am 7 Jun. 2012
Agreed, after a week long of manipulating data, re-recording etc I have reached the same conclusion.
Does anyone has similiar expereinces?
Any suggestions as to which mechanism to choose?
Thanks & Regards
Walter Roberson
Walter Roberson am 7 Jun. 2012
Did PCA find anything interesting?

Melden Sie sich an, um zu kommentieren.


Peter Perkins
Peter Perkins am 7 Jun. 2012
Avishek, it's not clear what you mean by "NOT getting a good clustering". If I understand your code correctly, you are plotting the cluster number of each row in the data vs. its row number. Unless the data are already in a special order, there's no reason why you would expect to see anything other than a big jumble of points along discrete horizontal lines. Perhaps you are seeing one big jumble and a bunch of (near) singletons and observing that you have no useful clusters. Perhaps your description was intended to mean that the data are in some special order, and you are just testing whether or not kmeans can recreate it. I can't tell.
Two things:
  • The silhouette function may prove useful to visualize whether or not kmeans found "good" clusters.
  • You're using correlation distance which will only be useful for a very particular kind of data. I don't know anything about your data, so you may have a good reason for using correlation. You do say you tried squared euclidean and cosine distance, so perhaps correlation distance was just your last try at getting something to work.
Hope this helps.
  2 Kommentare
Avishek Dutta
Avishek Dutta am 8 Jun. 2012
You are correct in all your assumptions. As I said, this data has a speciality. Two IMU sensors recording movement (arm & leg) in 7 different scenarios (WALK, RUN, SIT and so on) together in ONE GO.
Infact I have 4 such data files. Arm_120Hz , Arm_10Hz, Leg_120Hz & Leg_10Hz.
From none of them is KMEANS able to separate the scenario clusters. What I see is always something like this,
7 +++++++++++ +++ +++ +++++
6 ++++++++ ++ ++++
5 ++++++
4 +++ +
3
2 +++++++
1 ++++++
"+" are the data points, 7 is no. of classes.
I never get a compact cluster. Does this explain the problem a bit clearly?
Will try the silhouette function.
Thanks
Avishek Dutta
Avishek Dutta am 8 Jun. 2012
Sorry, the editor removed the spaces i put to show the unclustered points. Please ignore the sketch.

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by