MATLAB Answers

0

Print all possible matrix combinations for a given matrix

Asked by Mattia Deriu on 8 Apr 2019
Latest activity Commented on by Mattia Deriu on 10 Apr 2019
Hi everybody, i am going mad at trying to solve this problem:
For a given input vector P, for example in my case
P=[4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5]
i create a matrix that HAS TO HAVE 3 columns and contain all P elements (order isn't important) and zero in the free places, for example in my case
G =
4.5000 4.5000 5.2500
5.2500 2.2500 0.6000
0.5000 0.6000 0.5000
0.3000 1.0000 2.0000
0.5000 0.5000 0
my goal is to swap matrix elements between columns to reach the disposition that satisfies this property: say that the sum of all elements in vector P is S.
The disposition should satisfy the property that the sum of all elements in column 1 must be as close as possible to S/3 and the same for column 2 and column 3.
; ;
my first attempt was trying to generate all possible matrix disposition or permutation an find this memorizing the matrix that get closer to my goal by while cycle with this condition
the problem is that using perms(G) for generating it unfortunately I get out of memory because perms generates ALL possible permutation but i need only combination of n element i 3 groups that is much less than all possible combination but i don't know how to create it
Example of one possible output found by hand (order in elements of the same column doesn't matter):
G =
4.5000 5.2500 5.2500
4.5000 2.2500 0.6000
0.5000 0.6000 1.0000
0.3000 0.5000 2.0000
0 0.5000 0.5000
sum_colon1=9.8
sum_colon2=9.1
sum_colon3=9.35
eps_colon1=0.383
eps_colon2=0.316
eps_colon3=0.066
minimum=0.766

  6 Comments

So basically, you're doing a montecarlo simulation. I'm not sure why you're only swapping two elements at a time. You may as well use a random permutation each time.
Simplification of your code, which probably doesn't affect its slowness since it's all down to the number of iterations:
P = [4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];
phase = 3;
itermax = 1e9;
P = [P; zeros(-mod(numel(P), phase), 1)];
Gmin = reshape(P, [], phase);
S = sum(P);
ERROR1 = sum(abs(sum(Gmin, 1) - S/3);
for i = 1:itermax
G = reshape(P(randperm(numel(P)), [], phase));
ERROR2 = sum(abs(sum(G, 1) - S/3));
if ERROR2 < ERROR1
ERROR2 = ERROR1
Gmin = G;
end
end
What sort of speed are you expecting. And also what sort of problem size are you expecting to solve? My answer below works for your size but will quickly become unmanageable for larger problems.
Yes my code takes
Elapsed time is 0.042251 seconds.
for reching ERROR 2 as 0.766 but i have to put a tolerance of 0.8 as a stop condition, the problem is that it cannot stop by itself. Anyway many many thanks for your code improvement!!
I've found a way to calculate only the significative permutations and check only them once at a time.
The problem is that your code run out of memory for n=20 more or less. Calculating only all significative matrix speed up the solution. I put the code here if someone could encounter the same problem.
tic
P=[1;2;3;4;1;2;3;1;2;3;1;0.6;2;1;0;0.2;4;2;1;4]; %given input vector
n=length(P);
phase=3;
S=sum(P);
P=P';
Mc=ceil(n/phase);
P=[P zeros(1,Mc*phase-n)];
G=reshape(P,phase,Mc).';
permutation=factorial(3)^(ceil(n/3)); % number of different matrix combinations
%% this create the permutation matrix that has inside
%% all index needed for starting permutation calculations in the next for cycle
for i=Mc:-1:1
for j=1:permutation
PERMAT(i,j)=mod(ceil((j)/(6^(Mc-i))),6);
if PERMAT(i,j)==0
PERMAT(i,j)=6;
end
end
end
%% utilize a cell to store permutations, index i contains all i-row permutations
ALLCOMBS=cell(Mc,1);
for i=1:Mc
ALLCOMBS{i}=perms(G(i,:));
end
%% initialize error by some casual large number
ERROR1=10^7;
%% start creating matrix permutations and searching for the optimum one
for j=1:permutation
for i=Mc:-1:1
G(i,:)=ALLCOMBS{i}(PERMAT(i,j),:);
L1=G(:,1);
L2=G(:,2);
L3=G(:,3);
end
ERROR2=abs(sum(L1)-S/3)+abs(sum(L2)-S/3)+abs(sum(L3)-S/3);
if ERROR2<ERROR1
ERROR1=ERROR2;
Gmin=G;
L_m1=Gmin(:,1);
L2_m=Gmin(:,2);
L3_m=Gmin(:,3);
end
end
toc

Sign in to comment.

2 Answers

Answer by Guillaume
on 8 Apr 2019
Edited by Guillaume
on 9 Apr 2019
 Accepted Answer

My comment to John made me realise what an efficient way to go about it would be. The crux of the problem is creating the unique permutations of repelem(1:numberofsets, setsize) or finding unique permutations of a set with repetition. I'm sure that there's something to do that already on the file exchange but I can't find it.
I've left my original answer below, but don't use it. This is much faster and more efficient. Note that the code for finding unique permutations is heavily inspired by this post. First the helper function that generates the unique permutations
function partitions = uniquesets(setcount, setlength)
%generate unique partitions of P sets of length N
%number of partitions = factorial(P*N) / factorial(N)^P
%setcount: the number of sets in the partition
%setlength: the length of each set in the partition
%partitions: a K*L array, where K = setcount*setlength and L = factorial(setcount*setlength) / factorial(setlength)^setcount. Each column is a partition. Rows are set indices from 1 to setcount
lefttoplace = setcount * setlength;
partitions = zeros(lefttoplace, 1);
nperms = 1;
for subset = 1:setcount
locs = nchoosek(1:lefttoplace, setlength).';
[emptyrows, ~] = find(~partitions);
emptyrows = reshape(emptyrows, [], nperms);
destrows = emptyrows(locs, :);
nperms = nperms * size(locs, 2);
destcols = repelem((1:nperms)', setlength);
partitions = repelem(partitions, 1, size(locs, 2)) + accumarray([destrows(:), destcols], subset);
lefttoplace = lefttoplace - setlength;
end
end
Then the code to use it:
P = [4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];
phase = 3;
P = [P; zeros(mod(-numel(P), phase), 1)];
partitions = uniquesets(phase, numel(P)/phase);
[~, order] = sort(partitions, 1);
allmatrices = reshape(P(order), numel(P)/phase , phase, []);
colsum = sum(allmatrices, 1);
distance = sum(abs(colsum - sum(P)/3), 2);
[minsum, whichmatrix] = min(distance)
chosenmatrix = allmatrices(:, :, whichmatrix)
mincolsum = colsum(1, :, whichmatrix)
This is very quick, about half a second on my machine and will work for much longer P.
It still does create a few duplicate matrices as duplicate numbers in P are still considered distinct for the purpose of partitioning. However, it does not creat unecessary partitions.
-------
Original answer, which wasn't efficient at all
Bearing in mind this is not my domain at all (this is a problem for John), here is a solution that works for this problem size but is probably very inefficient.
It uses this fileexchange entry to find all partitions of size 3 of your P vector. This is very inefficient since of these, you only want the partitions where each subset has size 5. Still, it's better than computing all permutations. In addition, duplicate numbers are considered distinct for the partitioning problem, again giving you more partitions than necessary:
P = [4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];
parts = partitions([P;zeros(mod(-numel(P), 3), 1)], 3); %generates 2375101 partitions, takes a little while
parts = parts(cellfun(@(c) all(cellfun('length', c) == 5), parts)); %only keep partitions with subsets of length 5, there are 126126 of these. Also takes a while
allmatrices = cellfun(@(c) [c{:}], parts, 'UniformOutput', false); %convert each partition into a 5x3 matrix
allmatrices = cat(3, allmatrices{:}); %and convert into a 5x3x126126 matrix.
%note that some matrices are duplicated since as said, duplicate numbers were considered distinct for the partitioning purpose
[minsum, which] = min(sum(abs(sum(allmatrices, 1) - sum(P)/3), 2))
selectedmatrix = allmatrices(:, :, which)
columnsum = sum(selectedmatrix, 1)
columneps = columnsum - sum(P)/3
minsum =
0.76667
which =
29036
selectedmatrix =
4.5 5.25 5.25
4.5 2.25 1
0.5 0.6 2
0.3 0.6 0.5
0 0.5 0.5
columnsum =
9.8 9.2 9.25
columneps =
0.38333 -0.21667 -0.16667

  4 Comments

Show 1 older comment
quite don't understand what this does
for subset = 1:setcount
locs = nchoosek(1:lefttoplace, setlength).';
[emptyrows, ~] = find(~partitions);
emptyrows = reshape(emptyrows, [], nperms);
destrows = emptyrows(locs, :);
nperms = nperms * size(locs, 2);
destcols = repelem((1:nperms)', setlength);
partitions = repelem(partitions, 1, size(locs, 2)) + accumarray([destrows(:), destcols], subset);
lefttoplace = lefttoplace - setlength;
end
quite don't understand what this does
As I wrote, it's based on this code by Bruno Luong. It calculates all unique permutations of a set of numbers (setcount) each repeated setlength times. Each column of the output is a different permutation.
E.g for 3 sets (3 columns) of numbers repeated 5 times, you get:
>> partitions = uniquesets(3, 5);
>> partitions(:, 1:10) %look at the 1st 10 columns
ans =
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 3 3 3 3
2 3 3 3 3 3 2 2 2 2
3 2 3 3 3 3 2 3 3 3
3 3 2 3 3 3 3 2 3 3
3 3 3 2 3 3 3 3 2 3
3 3 3 3 2 3 3 3 3 2
3 3 3 3 3 2 3 3 3 3
So, each column gives you a different way of partitioning P into 3 columns. First partition column tells you to put the 1st 5 elements of P into the 1st column of G, the next 5 in the 2nd column, and the last 5 in the 3rd. The second partition column swaps P(9) and P(10). The 3rd partition swaps P(9) and P(11) compared to the 1st one, etc.
I've just realised it calculates too many partitions, since for your purpose partitions [1 1 1 1 1 2 2 2 2 2 3 3 3 3 3] and [3 3 3 3 3 2 2 2 2 2 1 1 1 1 1] are the same. It just swaps whole columns of G. Considering that it is fast enough as it is, improvement so that these permutations are not generated is left as an exercise to the reader.
Is this code capable of making every permutation different from the previous?
Yes, as said it generates too many, but it won't miss any. As I wrote as a comment in the code it generates permutations.
Now it's clear. thanks

Sign in to comment.


Answer by John D'Errico
on 8 Apr 2019
Edited by John D'Errico
on 8 Apr 2019

You are going mad because ...
  • Your problem is poorly formulated. i.e., how close to S/3 do you need to be?
  • For long vectors, your problem really is large, especially as you are trying to solve it.
For a vector of length n, there are 2^n subsets of elements of the vector. This is well known, but 2^n grows exponentially. Note that perms is not appropriate here, so at least you don't have factorial growth, but it is still big.
P=[4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];
numel(P)
ans =
14
Given the vector P of length 14, there are 2^14 subsets of elements to consider. We can think of them as the bits:
B14 = dec2bin(0:2^14-1);
>> B14(1:10,:)
ans =
10×14 char array
'00000000000000'
'00000000000001'
'00000000000010'
'00000000000011'
'00000000000100'
'00000000000101'
'00000000000110'
'00000000000111'
'00000000001000'
'00000000001001'
That was for the first 10 subsets we might choose. So each element of P is either included or excluded in a sum, as indicated by the corresponding bit.
(I'll discuss later the issue that it looks like you want at MOST 3 terms in the final sum.)
Now, your goal is essentially to find all such sums. In the end, you probably want to rank the sums, sorting them such that those with a sum closes to S/3 are at the top. But since you never say how close to S/3 you want to allow the sums to go, you effectively need to compute them all. This is not even remotely difficult for a vector of length only 14 though.
P = [4.5; 4.5; 5.25; 5.25; 2.25; 0.6; 0.5; 0.6; 0.5; 0.3; 1; 2; 0.5; 0.5];
n = numel(P);
S = sum(P);
binP = dec2bin(0:2^n-1) - '0';
allsums = binP*P;
[err,tags] = sort(abs(allsums - S/3),'ascend');
allsums = allsums(tags);
Here, the target sum is:
S/3
ans =
9.41666666666667
It turns out there are 57 distinct ways to form a sum of 9.4, but we can clearly never acheive 9.41666666666667.
allsums(1:100)'
ans =
Columns 1 through 11
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 12 through 22
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 23 through 33
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 34 through 44
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 45 through 55
9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4 9.4
Columns 56 through 66
9.4 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45
Columns 67 through 77
9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45
Columns 78 through 88
9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45 9.45
Columns 89 through 99
9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35 9.35
Column 100
9.35
What were the 10 first such sums that were found? Represented bya 1 here where a given term was included in the sum, we see:
binP(tags(1:10),:)
ans =
0 0 0 1 1 0 0 1 0 1 0 0 1 1
0 0 0 1 1 0 0 1 0 1 1 0 0 0
0 0 0 1 1 0 0 1 1 1 0 0 0 1
0 0 0 1 1 0 0 1 1 1 0 0 1 0
0 0 0 1 1 0 1 1 0 1 0 0 0 1
0 0 0 1 1 0 1 1 0 1 0 0 1 0
0 0 0 1 1 0 1 1 1 1 0 0 0 0
0 0 0 1 1 1 0 0 0 1 0 0 1 1
0 0 0 1 1 1 0 0 0 1 1 0 0 0
0 0 0 1 1 1 0 0 1 1 0 0 0 1
We can extract the specific terms in one such subset sum as:
binP(tags(1),:).*P'
ans =
Columns 1 through 11
0 0 0 5.25 2.25 0 0 0.6 0 0.3 0
Columns 12 through 14
0 0.5 0.5
So, was this all easy? Well, yes. Trivially so, at least for a vector of length only 14. I could have succeeded up to around 20 elements in the vector P. If you increase the length of P by a significant amount, things will still go to hell. So you can effectively never generate all such sums for a vector of length 50. (At least not with the computers we have available today.) Even that operation for partial sums of 25 terms or so will be brutal.
Remember that we computed ALL such partial sums, because we really don't know when to stop looking at all possible subsets. How far away from S/3 are we allowed to go?
allsums(tags(end))
ans =
28.25
>> S
S =
28.25
>> binP(tags(end),:)
ans =
1 1 1 1 1 1 1 1 1 1 1 1 1 1
So one of those candidate sums was the sum of all elements, and since none of the elements of P were negative, that sum will lie way out in the weeds.
For large, long vectors, is there a better way? Well yes. You could write an algorithm that recursively generated subsets of P, then used a tolerance to decide if a sum was too far away. But even this can be very difficult to perform, because you may not be able to know how to generate that tolerance. This could get difficult. Doable. But sometimes difficult.
Now, suppose we want to solve the problem of finding all subsets of at most 3 terms in any candidate sum? This is most easily done using nchoosek. So, I might split the problem into finding all partial sums of 1, 2, or 3 terms, then combining them at the end.
S = sum(P);
Starget = S/3;
Stol = 0.2; % a tolerance on the sum error
T3 = nchoosek(P,3);
err = abs(sum(T3,2) - Starget);
T3(err > Stol,:) = [];
err(err> Stol) = [];
[err,tags] = sort(err,'ascend');
T3 = T3(tags,:);
This is now trivially easy to write, as you see. And it is efficient, since we only cared about at most 3 terms in the partial sums. In fact, there were only 9 partial sums of 3 terms that were within a tolerance of 0.2 of the target sum.
T3
T3 =
4.5 4.5 0.5
4.5 4.5 0.5
4.5 4.5 0.5
4.5 4.5 0.5
5.25 2.25 2
5.25 2.25 2
4.5 4.5 0.3
4.5 4.5 0.6
4.5 4.5 0.6
Now you could do the same set of operations for exactly 2 terms in the sum, then look at 1 term. Each time, keep only the partial sets that were within the designated tolerance on the sum.
The nice thing is, even for significantly longer vectors P, if you are never interested of partial sums of never more than 3 terms, this is still doable. Thus, for vectors of length 50 or 100,
nchoosek(50,3)
ans =
19600
>> nchoosek(100,3)
ans =
161700
We still only need to consider a few tens of thousands of combinations, or even 100K such partial sets. It will still get nasty for vectors with many thousands of elements to be considered, but you can always overwhelm any computational scheme if you try hard enough..
So don't go mad. Just use the proper tools.

  8 Comments

That is eactly what I gave you. In the first part of my answer, I showed how to generate ALL subsets. Thus every possible subset.
Then I showed how to compute ALL possible subsets of only 3 elements. Then I computed the sum of each, sorting them to be in increasing distance from the target. If you want to compute ALL possible subsets, then just don't use a tolerance at all.
Asfar as your comment that each sum needs to be as close as possible to the target sum, that is indeed achieved by my use of a sort and a tolerance. Surely that is what you mean by "as close as possible".'
Let me say it differently:
  • The word "all" implies every single subset.
  • The phrase "as close as possible" implies there are some subsets that are deemed to be unacceptable, because they are not sufficiently close. That is, they exceed some mininum deviation from the target.
However, you cannot have both all and some at the same time. All is all, and some is not all.In fact, I showed you how to do either of those choices. So what is your problem, if you are not happy with both the "all" solution and the "some" solution?
If you have some other problem, then you need to explain your problem far more clearly, because you are making no sense in these comments.
@John, the way I understood the problem, the purpose is not to find a single subset with a given number of elements. Instead it's to partition P into N subsets of equal size, then find which such partition has the minimal absolute difference from S/3 for the N sums of the subsets. So for the bins you're not looking for binary bins:
binP = [
0 0 0 1 1 0 0 1 0 1 0 0 1 1
0 0 0 1 1 0 0 1 0 1 1 0 0 0
...
but for bins of the form
binS = [
1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
1 1 1 1 2 1 2 2 2 2 3 3 3 3 3
1 1 1 1 2 2 1 2 2 2 3 3 3 3 3
... unique permutations of repelem(1:3, 5)
Once you have binS, it is trivial to compute the subset sums:
N = 3;
P = [P;zeros(mod(-numel(P), N), 1)]; %pad to multiple of N
[~, order] = sort(binS);
matrices = reshape(P(order)', [], N, size(binS, 1));
colsum = sum(matrices, 1);
distance = sum(abs(colsum - sum(P)/3), 2);
[~, chosenmatrix] = min(distance);
@john I have many problems but don't mind about me, i find a way to solve it that is ok for me.

Sign in to comment.