Clustering and feedforwardnet giving always the same result

Question

Hello! I am having some problems with clustering and feedforwardnet. The idea is compare these methods and improve the accuracy for feedforwardnet. The problem is that the always clustering accuracy is equal to feedforwardnet accuracy for test set. I think feedforwardnet trained with the new training data should give better results.

Algorithms steps:

1 - divide the data in two differents sets: training and test set.

2 - create a model for clustering using the training set.

3 - try to classify the test set using the clustering model.

4 - create a secondary training set, using the elements in the clustering classification which the
assigned class is identical to the original target class and the maximum probability for belonging
to the class is greater than z (0.7, for example).

5 - use the secondary training set to train a feedforwardnet

6 - classify the initial test set using the feedforwardnet created


  % Reading the excel file created
  german_data =  xlsread('german_data_numeric.xlsx');
  
  % Setting the size of training dataset (k value)
  training_size = 0.9;
  
  % Shuffle rows of matrix: (some learning algorithms needs to shuffle the data!)
  [r,~] = size(german_data);
  randomRowIdxs = randperm(r);
  german_data = german_data(randomRowIdxs,:);
  
  % Splitting the dataset on two: the train and the test dataset
  X_train = german_data(1:(training_size*length(german_data)),:);
  X_test = german_data(training_size*length(german_data)+1:end,:);
  
  % Taking the labels (it is the last column of german_data)
  Y_train = X_train(:,end)';
  Y_test = X_test(:,end)';
  
  % Taking off the labels from the datasets
  X_train = X_train(:,1:end-1)';
  X_test = X_test(:,1:end-1)';
  
  % Training model with the initial training lot (clustering)
  GMModel = fitgmdist(X_train',2,'SharedCovariance',true,'CovarianceType','diagonal','Replicates',1000,'Start','randSample','RegularizationValue',0.01);
  
  % class contains the predictions for X_train and P contains the
  % probabilitys for each class
  [class, nlogl, P, logpdf] = cluster (GMModel, X_train');
  class = verify_class(class,Y_train);
  
  % Creating the secondaries training lots for each z value
  % (see the function 'generate_datasets')
  
  z = 0.70;
  [generated_Xtrain_70, generated_Ytrain_70] = generate_datasets(X_train,Y_train,z,P,class);
  
  % Using a secondary training lot to train another classifier (using z = 0.7, for example)
  generated_GMModel = fitgmdist(generated_Xtrain_70',2,'Regularize',0.01);
  [generated_class, gen_nlogl, gen_P, gen_logpdf] = cluster (generated_GMModel,generated_Xtrain_70');
  generated_class = verify_class(generated_class,generated_Ytrain_70);
  
  % Testing the created models
  [initial_class_test, ~, ~] = cluster (GMModel, X_test');
  initial_class_test = verify_class(initial_class_test,Y_test);
  [secondary_class_test, ~, ~] = cluster (generated_GMModel, X_test');
  secondary_class_test = verify_class(secondary_class_test,Y_test);
  
  % MLP training with regularized training data
  
  X_MLP = horzcat(generated_Xtrain_70,X_test);
  Y_MLP = horzcat(generated_Ytrain_70,Y_test);
  
  % Creating a multilayer network the created net that has 3 layers, with 10
  % neurons each ([10,10,10]), for example
  net = feedforwardnet([10 10 10]); 
  net = configure(net,X_MLP,Y_MLP);
  net.performParam.regularization = 0.19;
  net.divideFcn = 'divideblock';
  net.divideParam.trainRatio = 0.7*(length(Y_MLP)-100);
  net.divideParam.valRatio   = 0.3*(length(Y_MLP)-100);
  net.divideParam.testRatio  = 100;
  % The test batch will be the same for initial clustering
  
  [net, training_record] = train(net,X_MLP,Y_MLP);
  
  % Training Confusion Plot Variables
  yTrn = net(X_MLP(:,training_record.trainInd));
  tTrn = Y_MLP(:,training_record.trainInd);
  
  % Validation Confusion Plot Variables
  yVal = net(X_MLP(:,training_record.valInd));
  tVal = Y_MLP(:,training_record.valInd); 
  
  % Test Confusion Plot Variables
  yTst1 = net(X_MLP(:,training_record.testInd));
  tTst1 = Y_MLP(:,training_record.testInd);
  
  % Test Confusion Plot Variables
  yTst = net(X_test);
  tTst = Y_test;

  accuracy = length(find(round(yTst) == tTst))/length(tTst)*100
  
  %% Ploting the confusion matrix
  
  % Plot confusion (This function accepts only arrays with 0 or 1 values.
  % Thus, we must subtract 1 for each array because the classes are 1 and 2.)
  
  plotconfusion(Y_train-1, class'-1, 'Initial training lot',Y_test-1, initial_class_test'-1, 'Initial test lot',...
      generated_Ytrain_70-1,generated_class'-1, 'Generated training lot',Y_test-1,secondary_class_test'-1, 'Secondary test lot',...
      tTrn-1, yTrn-1, 'NN Training', tVal-1, yVal-1, 'NN Validation', tTst1-1, yTst1-1, 'NN Test');


  % Verify_class function:
  
  function [verified_class] = verify_class(class,target)
  % If accuracy < 50; invert all the classes obtained
  accuracy = length(find(class == target'))/length(target)*100;
  if(accuracy<50)
      class(class==1) = 3;
      class(class==2) = 1;
      class(class==3) = 2;
  end
  verified_class = class;
  end

Clustering and feedforwardnet giving always the same result

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (0)

Kategorien

Tags

Community Treasure Hunt

Clustering and feedforwardnet giving always the same result

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Antworten (0)

Kategorien

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden