Form a matrix with variable number of columns (with fixed row numbers)

1 Ansicht (letzte 30 Tage)
I'm clustering a 70,000x3 data matrix(say X is data and K is no. of clusters). For that I want to store the index value (positions) of data points belonging to each cluster(required for further calculations and work) . As per the problem, it would be like Y[c, j] (c = cluster number, j = index vslue of data point ). But, the clusters formed isn't equally distributed. The number of data points belonging to different clusters will be different. Is there a way to form such matrix with variable number of columns and fixed number of rows? If not, please suggest the another way.
Thanks in advance !

Akzeptierte Antwort

Walter Roberson
Walter Roberson am 19 Dez. 2018
Just store the cluster number for each point. If you need to know which points belong to one particular cluster then use logical masks idx == cluster_number or find(idx == cluster_number). If you do that a lot, then calculate it once and store in cell arrays.
If it is for some reason particularly important to store everything in a single numeric array, then zero pad or nan pad the shorter rows.

Weitere Antworten (1)

Image Analyst
Image Analyst am 19 Dez. 2018
Why should they have the same number? What if there aren't? I guess you could force one by taking the principal components with pca() and then sorting on PC1 and the splitting it at the half way point into two clusters. Would that do what you want?
Also, what is X and K? Is K the number of clusters you want, like 2 or 3? Then what is X?
Attach your data with the paper clip icon, and a screenshot of it plotted with scatter3() using the insert frame icon.
  1 Kommentar
Pushkar Khatri
Pushkar Khatri am 20 Dez. 2018
Sorry for confusion, X = data matrix, K = number of clusters( 5 or 7)
I am saying that the number of data points in each cluster won't be same. Since, there is random allocation of clusters initially ( also 70000/ K will not be integer for all K ).
Why is the sorting needed ? Can't I just store the indices values of data points of a cluster (if that's possible ) and then use that for further clustering?

Melden Sie sich an, um zu kommentieren.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by