how to find feature distribution in kmeans clustering

3 Ansichten (letzte 30 Tage)
Dhruvin Naik
Dhruvin Naik am 10 Feb. 2022
Kommentiert: Image Analyst am 15 Feb. 2022
I am trying to to do kmeans clustering on the data available to me. The data consists of information for each student (56 students in total) and their features like scores for each subject, other metrics like performance parameter, etc. There are total 39 features for each student. So the data matrix is (56*39). I used kmeans clustering to group the students in two clusters. I have attached the result of the clustering in the figure below. The data is plotted along the principal components. I want to know how the features are distributed along these clusters ? Something like score1 is high (above certain value) in cluster1 and low in cluster2, score2 is low in cluster 1 and high in cluster2. Is there a way to know how the features are distributed in these two clusters ? I want to find features that contribute to each Kmeans cluster.
i have used idx = kmeans(X,k) function in Matlab

Antworten (1)

Image Analyst
Image Analyst am 10 Feb. 2022
Bearbeitet: Image Analyst am 10 Feb. 2022
You can call pca() to get the loadings and scores for each of the 39 different features for each PC. Like the first column represents PC1 and the 39 different values in the loadings vector represent the weights of the 39 different original feature values. You can also ask pca() for the amount of output variation explained by each of the original feature, like feature 1 (score) explains 60% of the variation, and feature 2 (performance metric 2, like days of class missed or whatever) explains 30% of the variation.
I'm not sure why you're doing kmeans on PCs in the first place. Seems weird to me. I mean all the PC's are supposed to be independent so plotting any of them vs the other would just look like a random shotgun blast, kind of like yours does. There is only very weak correlation, as expected. So why do clustering on them? If anything you'd do kmeans on the original data, not the principal components.
  6 Kommentare
Dhruvin Naik
Dhruvin Naik am 15 Feb. 2022
I did the PCA on the two clusters and got the principle components for both the clusters. Can you please tell me how should i compare the principle components from two clusters and map it to the original feature so that i can know if a given feature is more dominant in cluster one or cluster two ?
Image Analyst
Image Analyst am 15 Feb. 2022
The coefficients (first returned variable from pca()) give you that - they give you the relative weights of the original variables that are used when making the PC from the original variable values.

Melden Sie sich an, um zu kommentieren.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by