How can I improve Matlab perfomance for this code?

Question

0 Stimmen

Adn and custo are same size vectors, with a length of 10252 elements (length(Adn)=10252). Every row value in custo corresponds to the cost of the same row value in Adn . I'm trying to get the sum of all the possible combinations for the values in Adn and analyze the cost of that combination. From the Adn combinations, I am just interested in the ones that sum to a specific interval (which is lower than sum(A) and greater than sum(A)*0.9). A is previously written in the code, so it doesn't matter now. Then I want to store the combinations and their equivalent costs ( custoAll ) in two new vectors.

nA=numel(Adn);
for k1=1:nA
   B=combntns(Adn,k1);
   C=sum(B,2);          
   E=combntns(custo,k1);
   F=sum(E,2);
   D=find(((C<(sum(A)))&(C>(sum(A)*0.9))));
   if ~isempty(D)
       combinations=[combinations; B(D,:)];
       custoAll=[custoAll; F(D)];
   end          
end

This code is taking very long though. I know I should preallocate the variables combinations and custoAll otherwise they will change size in every iteration and it'll take longer to run, but I haven't been able to find a good way to do it, since Adn is already so huge. Does anybody have an insight on this?

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Follow Question

Answer 1

Matz Johansson Bergström am 25 Aug. 2014

Bearbeitet: Matz Johansson Bergström am 25 Aug. 2014

In MATLAB Online öffnen

0 Stimmen

It seems to me that combntns (which is the same as nchoosek) is taking a long time to finish. So, the appending of the combinations is not the bottleneck in this case.

Consider your example with 10252 elements. Then during the first loop nchoosek(10252, 1) = 10252. For the next loop we have nchoosek(10252,2)= 52546626. This will be the number of rows of B.

So B will then become nchoosek(10252,2)*8*2 bytes, because we use double precision, which is nchoosek(10252,2)*8*2/2^20 = 801 MB.

Ok, just for the fun of it, for the next loop B will become nchoosek(10252,3)*3*8/2^20 which is 4 TB. This is just to store the answer to B, we are not even considering the computational time.

So, you will never be able to compute this using matrices in the current way because you cannot possibly store them. I would suggest to use another approach like looping all combinations in some way. I will have to think about it.

The computation "N choose k" will increase and become smaller, sort of like a parabola. The largest number of combinations you will get from

nchoosek(10252, 10252/2)

In which we get from Matlab

Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only accurate to 15 digits

Remember, this is just the number of rows in B.

So I would suggest to use something else to solve the problem. I'm thinking about binary numbers, but I'm not sure. I will have to get back to this question later today.

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Matz Johansson Bergström am 25 Aug. 2014

Bearbeitet: Matz Johansson Bergström am 25 Aug. 2014

I believe this problem is NP-complete. It can be seen as a relaxed version of the "Subset sum" problem, but they are really the same, see http://en.wikipedia.org/wiki/Subset_sum_problem.

This really means that this kind of problem does not have a program that can compute the answer in a quick way. However, this does not mean it is not solvable, you just have to find another way.

Even if you managed to speed up the code by creating a mask of binary numbers for instance, you will have to find all combinations and there are many iterations.

I would look at the subset sum and see if your question can be modified to that problem and use that code.

Melden Sie sich an, um zu kommentieren.

How can I improve Matlab perfomance for this code?

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Tags

Community Treasure Hunt

How can I improve Matlab perfomance for this code?

0 Kommentare -2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

Akzeptierte Antwort

1 Kommentar -1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden

Weitere Antworten (0)

Kategorien

Tags

Siehe auch

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen -2 ältere Kommentare ausblenden

1 Kommentar
-1 ältere Kommentare anzeigen -1 ältere Kommentare ausblenden