160 views (last 30 days)

Hi everybody, i am going mad at trying to solve this problem:

For a given input vector P, for example in my case

P=[4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5]

i create a matrix that HAS TO HAVE 3 columns and contain all P elements (order isn't important) and zero in the free places, for example in my case

G =

4.5000 4.5000 5.2500

5.2500 2.2500 0.6000

0.5000 0.6000 0.5000

0.3000 1.0000 2.0000

0.5000 0.5000 0

my goal is to swap matrix elements between columns to reach the disposition that satisfies this property: say that the sum of all elements in vector P is S.

The disposition should satisfy the property that the sum of all elements in column 1 must be as close as possible to S/3 and the same for column 2 and column 3.

; ;

my first attempt was trying to generate all possible matrix disposition or permutation an find this memorizing the matrix that get closer to my goal by while cycle with this condition

the problem is that using perms(G) for generating it unfortunately I get out of memory because perms generates ALL possible permutation but i need only combination of n element i 3 groups that is much less than all possible combination but i don't know how to create it

Example of one possible output found by hand (order in elements of the same column doesn't matter):

G =

4.5000 5.2500 5.2500

4.5000 2.2500 0.6000

0.5000 0.6000 1.0000

0.3000 0.5000 2.0000

0 0.5000 0.5000

sum_colon1=9.8

sum_colon2=9.1

sum_colon3=9.35

eps_colon1=0.383

eps_colon2=0.316

eps_colon3=0.066

minimum=0.766

Guillaume
on 8 Apr 2019

Edited: Guillaume
on 9 Apr 2019

My comment to John made me realise what an efficient way to go about it would be. The crux of the problem is creating the unique permutations of repelem(1:numberofsets, setsize) or finding unique permutations of a set with repetition. I'm sure that there's something to do that already on the file exchange but I can't find it.

I've left my original answer below, but don't use it. This is much faster and more efficient. Note that the code for finding unique permutations is heavily inspired by this post. First the helper function that generates the unique permutations

function partitions = uniquesets(setcount, setlength)

%generate unique partitions of P sets of length N

%number of partitions = factorial(P*N) / factorial(N)^P

%setcount: the number of sets in the partition

%setlength: the length of each set in the partition

%partitions: a K*L array, where K = setcount*setlength and L = factorial(setcount*setlength) / factorial(setlength)^setcount. Each column is a partition. Rows are set indices from 1 to setcount

lefttoplace = setcount * setlength;

partitions = zeros(lefttoplace, 1);

nperms = 1;

for subset = 1:setcount

locs = nchoosek(1:lefttoplace, setlength).';

[emptyrows, ~] = find(~partitions);

emptyrows = reshape(emptyrows, [], nperms);

destrows = emptyrows(locs, :);

nperms = nperms * size(locs, 2);

destcols = repelem((1:nperms)', setlength);

partitions = repelem(partitions, 1, size(locs, 2)) + accumarray([destrows(:), destcols], subset);

lefttoplace = lefttoplace - setlength;

end

end

Then the code to use it:

P = [4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];

phase = 3;

P = [P; zeros(mod(-numel(P), phase), 1)];

partitions = uniquesets(phase, numel(P)/phase);

[~, order] = sort(partitions, 1);

allmatrices = reshape(P(order), numel(P)/phase , phase, []);

colsum = sum(allmatrices, 1);

distance = sum(abs(colsum - sum(P)/3), 2);

[minsum, whichmatrix] = min(distance)

chosenmatrix = allmatrices(:, :, whichmatrix)

mincolsum = colsum(1, :, whichmatrix)

This is very quick, about half a second on my machine and will work for much longer P.

It still does create a few duplicate matrices as duplicate numbers in P are still considered distinct for the purpose of partitioning. However, it does not creat unecessary partitions.

-------

Original answer, which wasn't efficient at all

Bearing in mind this is not my domain at all (this is a problem for John), here is a solution that works for this problem size but is probably very inefficient.

It uses this fileexchange entry to find all partitions of size 3 of your P vector. This is very inefficient since of these, you only want the partitions where each subset has size 5. Still, it's better than computing all permutations. In addition, duplicate numbers are considered distinct for the partitioning problem, again giving you more partitions than necessary:

P = [4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];

parts = partitions([P;zeros(mod(-numel(P), 3), 1)], 3); %generates 2375101 partitions, takes a little while

parts = parts(cellfun(@(c) all(cellfun('length', c) == 5), parts)); %only keep partitions with subsets of length 5, there are 126126 of these. Also takes a while

allmatrices = cellfun(@(c) [c{:}], parts, 'UniformOutput', false); %convert each partition into a 5x3 matrix

allmatrices = cat(3, allmatrices{:}); %and convert into a 5x3x126126 matrix.

%note that some matrices are duplicated since as said, duplicate numbers were considered distinct for the partitioning purpose

[minsum, which] = min(sum(abs(sum(allmatrices, 1) - sum(P)/3), 2))

selectedmatrix = allmatrices(:, :, which)

columnsum = sum(selectedmatrix, 1)

columneps = columnsum - sum(P)/3

minsum =

0.76667

which =

29036

selectedmatrix =

4.5 5.25 5.25

4.5 2.25 1

0.5 0.6 2

0.3 0.6 0.5

0 0.5 0.5

columnsum =

9.8 9.2 9.25

columneps =

0.38333 -0.21667 -0.16667

Guillaume
on 9 Apr 2019

quite don't understand what this does

As I wrote, it's based on this code by Bruno Luong. It calculates all unique permutations of a set of numbers (setcount) each repeated setlength times. Each column of the output is a different permutation.

E.g for 3 sets (3 columns) of numbers repeated 5 times, you get:

>> partitions = uniquesets(3, 5);

>> partitions(:, 1:10) %look at the 1st 10 columns

ans =

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1

2 2 2 2 2 2 2 2 2 2

2 2 2 2 2 2 2 2 2 2

2 2 2 2 2 2 2 2 2 2

2 2 2 2 2 2 3 3 3 3

2 3 3 3 3 3 2 2 2 2

3 2 3 3 3 3 2 3 3 3

3 3 2 3 3 3 3 2 3 3

3 3 3 2 3 3 3 3 2 3

3 3 3 3 2 3 3 3 3 2

3 3 3 3 3 2 3 3 3 3

So, each column gives you a different way of partitioning P into 3 columns. First partition column tells you to put the 1st 5 elements of P into the 1st column of G, the next 5 in the 2nd column, and the last 5 in the 3rd. The second partition column swaps P(9) and P(10). The 3rd partition swaps P(9) and P(11) compared to the 1st one, etc.

I've just realised it calculates too many partitions, since for your purpose partitions [1 1 1 1 1 2 2 2 2 2 3 3 3 3 3] and [3 3 3 3 3 2 2 2 2 2 1 1 1 1 1] are the same. It just swaps whole columns of G. Considering that it is fast enough as it is, improvement so that these permutations are not generated is left as an exercise to the reader.

Is this code capable of making every permutation different from the previous?

Yes, as said it generates too many, but it won't miss any. As I wrote as a comment in the code it generates permutations.

Sign in to comment.

John D'Errico
on 8 Apr 2019

Edited: John D'Errico
on 8 Apr 2019

You are going mad because ...

- Your problem is poorly formulated. i.e., how close to S/3 do you need to be?
- For long vectors, your problem really is large, especially as you are trying to solve it.

For a vector of length n, there are 2^n subsets of elements of the vector. This is well known, but 2^n grows exponentially. Note that perms is not appropriate here, so at least you don't have factorial growth, but it is still big.

P=[4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];

numel(P)

ans =

14

Given the vector P of length 14, there are 2^14 subsets of elements to consider. We can think of them as the bits:

B14 = dec2bin(0:2^14-1);

>> B14(1:10,:)

ans =

10×14 char array

'00000000000000'

'00000000000001'

'00000000000010'

'00000000000011'

'00000000000100'

'00000000000101'

'00000000000110'

'00000000000111'

'00000000001000'

'00000000001001'

That was for the first 10 subsets we might choose. So each element of P is either included or excluded in a sum, as indicated by the corresponding bit.

(I'll discuss later the issue that it looks like you want at MOST 3 terms in the final sum.)

Now, your goal is essentially to find all such sums. In the end, you probably want to rank the sums, sorting them such that those with a sum closes to S/3 are at the top. But since you never say how close to S/3 you want to allow the sums to go, you effectively need to compute them all. This is not even remotely difficult for a vector of length only 14 though.

P = [4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];

n = numel(P);

S = sum(P);

binP = dec2bin(0:2^n-1) - '0';

allsums = binP*P;

[err,tags] = sort(abs(allsums - S/3),'ascend');

allsums = allsums(tags);

Here, the target sum is:

S/3

ans =

9.41666666666667

It turns out there are 57 distinct ways to form a sum of 9.4, but we can clearly never acheive 9.41666666666667.

allsums(1:100)'

ans =

Columns 1 through 11

9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4

Columns 12 through 22

9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4

Columns 23 through 33

9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4

Columns 34 through 44

9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4

Columns 45 through 55

9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4

Columns 56 through 66

9.4 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45

Columns 67 through 77

9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45

Columns 78 through 88

9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45

Columns 89 through 99

9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35

Column 100

9.35

What were the 10 first such sums that were found? Represented bya 1 here where a given term was included in the sum, we see:

binP(tags(1:10),:)

ans =

0 0 0 1 1 0 0 1 0 1 0 0 1 1

0 0 0 1 1 0 0 1 0 1 1 0 0 0

0 0 0 1 1 0 0 1 1 1 0 0 0 1

0 0 0 1 1 0 0 1 1 1 0 0 1 0

0 0 0 1 1 0 1 1 0 1 0 0 0 1

0 0 0 1 1 0 1 1 0 1 0 0 1 0

0 0 0 1 1 0 1 1 1 1 0 0 0 0

0 0 0 1 1 1 0 0 0 1 0 0 1 1

0 0 0 1 1 1 0 0 0 1 1 0 0 0

0 0 0 1 1 1 0 0 1 1 0 0 0 1

We can extract the specific terms in one such subset sum as:

binP(tags(1),:).*P'

ans =

Columns 1 through 11

0 0 0 5.25 2.25 0 0 0.6 0 0.3 0

Columns 12 through 14

0 0.5 0.5

So, was this all easy? Well, yes. Trivially so, at least for a vector of length only 14. I could have succeeded up to around 20 elements in the vector P. If you increase the length of P by a significant amount, things will still go to hell. So you can effectively never generate all such sums for a vector of length 50. (At least not with the computers we have available today.) Even that operation for partial sums of 25 terms or so will be brutal.

Remember that we computed ALL such partial sums, because we really don't know when to stop looking at all possible subsets. How far away from S/3 are we allowed to go?

allsums(tags(end))

ans =

28.25

>> S

S =

28.25

>> binP(tags(end),:)

ans =

1 1 1 1 1 1 1 1 1 1 1 1 1 1

So one of those candidate sums was the sum of all elements, and since none of the elements of P were negative, that sum will lie way out in the weeds.

For large, long vectors, is there a better way? Well yes. You could write an algorithm that recursively generated subsets of P, then used a tolerance to decide if a sum was too far away. But even this can be very difficult to perform, because you may not be able to know how to generate that tolerance. This could get difficult. Doable. But sometimes difficult.

Now, suppose we want to solve the problem of finding all subsets of at most 3 terms in any candidate sum? This is most easily done using nchoosek. So, I might split the problem into finding all partial sums of 1, 2, or 3 terms, then combining them at the end.

S = sum(P);

Starget = S/3;

Stol = 0.2; % a tolerance on the sum error

T3 = nchoosek(P,3);

err = abs(sum(T3,2) - Starget);

T3(err > Stol,:) = [];

err(err> Stol) = [];

[err,tags] = sort(err,'ascend');

T3 = T3(tags,:);

This is now trivially easy to write, as you see. And it is efficient, since we only cared about at most 3 terms in the partial sums. In fact, there were only 9 partial sums of 3 terms that were within a tolerance of 0.2 of the target sum.

T3

T3 =

4.5 4.5 0.5

4.5 4.5 0.5

4.5 4.5 0.5

4.5 4.5 0.5

5.25 2.25 2

5.25 2.25 2

4.5 4.5 0.3

4.5 4.5 0.6

4.5 4.5 0.6

Now you could do the same set of operations for exactly 2 terms in the sum, then look at 1 term. Each time, keep only the partial sets that were within the designated tolerance on the sum.

The nice thing is, even for significantly longer vectors P, if you are never interested of partial sums of never more than 3 terms, this is still doable. Thus, for vectors of length 50 or 100,

nchoosek(50,3)

ans =

19600

>> nchoosek(100,3)

ans =

161700

We still only need to consider a few tens of thousands of combinations, or even 100K such partial sets. It will still get nasty for vectors with many thousands of elements to be considered, but you can always overwhelm any computational scheme if you try hard enough..

So don't go mad. Just use the proper tools.

John D'Errico
on 8 Apr 2019

That is eactly what I gave you. In the first part of my answer, I showed how to generate ALL subsets. Thus every possible subset.

Then I showed how to compute ALL possible subsets of only 3 elements. Then I computed the sum of each, sorting them to be in increasing distance from the target. If you want to compute ALL possible subsets, then just don't use a tolerance at all.

Asfar as your comment that each sum needs to be as close as possible to the target sum, that is indeed achieved by my use of a sort and a tolerance. Surely that is what you mean by "as close as possible".'

Let me say it differently:

- The word "all" implies every single subset.
- The phrase "as close as possible" implies there are some subsets that are deemed to be unacceptable, because they are not sufficiently close. That is, they exceed some mininum deviation from the target.

However, you cannot have both all and some at the same time. All is all, and some is not all.In fact, I showed you how to do either of those choices. So what is your problem, if you are not happy with both the "all" solution and the "some" solution?

If you have some other problem, then you need to explain your problem far more clearly, because you are making no sense in these comments.

Guillaume
on 8 Apr 2019

@John, the way I understood the problem, the purpose is not to find a single subset with a given number of elements. Instead it's to partition P into N subsets of equal size, then find which such partition has the minimal absolute difference from S/3 for the N sums of the subsets. So for the bins you're not looking for binary bins:

binP = [

0 0 0 1 1 0 0 1 0 1 0 0 1 1

0 0 0 1 1 0 0 1 0 1 1 0 0 0

...

but for bins of the form

binS = [

1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

1 1 1 1 2 1 2 2 2 2 3 3 3 3 3

1 1 1 1 2 2 1 2 2 2 3 3 3 3 3

... unique permutations of repelem(1:3, 5)

Once you have binS, it is trivial to compute the subset sums:

N = 3;

P = [P;zeros(mod(-numel(P), N), 1)]; %pad to multiple of N

[~, order] = sort(binS);

matrices = reshape(P(order)', [], N, size(binS, 1));

colsum = sum(matrices, 1);

distance = sum(abs(colsum - sum(P)/3), 2);

[~, chosenmatrix] = min(distance);

Sign in to comment.

Sign in to answer this question.

Opportunities for recent engineering grads.

Apply Today
## 6 Comments

## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691182

⋮## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691182

## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691186

⋮## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691186

## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691187

⋮## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691187

## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691199

⋮## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691199

## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691212

⋮## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_691212

## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_692242

⋮## Direct link to this comment

https://de.mathworks.com/matlabcentral/answers/455037-print-all-possible-matrix-combinations-for-a-given-matrix#comment_692242

Sign in to comment.