K-means for stock market timeseries
11 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Abdelrazzaq
am 19 Jan. 2014
Beantwortet: Abdelrazzaq
am 2 Feb. 2014
Hi
I am doing my research to test the accuracy of different volatility models in forecasting the stock market volatility using indexes time series. I need to cluster the data normally with K-means into two groups. I already have the time series from different stock markets but all came with the same length. I Just need to cluster each of them into two subsets. Then the first subset will be used to train the models and the second one will be used to test and to forecast the models. I wonder if you can give the direct code or at least how to start the k-means in Matlab.
I seriously look forward to hearing from you very soon.
Regards, Abdelrazzaq.
0 Kommentare
Akzeptierte Antwort
AJ von Alt
am 20 Jan. 2014
Bearbeitet: AJ von Alt
am 20 Jan. 2014
The function kmeans is part of the Statistics Toolbox in MATLAB. The following code demonstrates how to use k-means to cluster data into two groups and pull out the individual groups.
% Generate random data
nSamples = 100;
sampleWidth = 5;
X = rand(nSamples,sampleWidth);
trainingSetSize = 20;
% seperate into two groups using euclidean distance
% IDX will be size nsamples x 1 where each element indicates the label at
% that index
IDX = kmeans( X , 2 , 'distance' , 'sqEuclidean');
% separate the data into two groups
G1 = X(IDX == 1 , : );
G2 = X(IDX == 2 , : );
As a result of the k-means clustering, the groups will be self similar and would likely make very bad training and test data for an ML algorithm. A much more suitable function for generating training and test sets is the randsample function in the Statistics toolbox. By uniformly sampling a population at random, this function will provide more diverse training data to your ML algorithm and help improve its robustness.
% Randomly select trainingSetSize samples without replacement
rsIDX = randsample( size(X,1) , trainingSetSize );
% Create a logical mask for the selected values
tsMASK = false( nSamples , 1 );
tsMASK( rsIDX ) = true;
% Separate the data into training and test samples.
GTraining = X( tsMASK , : );
GTest = X( ~ tsMASK , : ) ;
Weitere Antworten (1)
Siehe auch
Kategorien
Mehr zu Statistics and Machine Learning Toolbox finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!