MATLAB Answers

0

Create binary vectors with specific hamming distance

Asked by Balázs Szabó on 16 Feb 2019
Latest activity Commented on by Balázs Szabó on 16 Feb 2019
Hi,
Let's say I have a binary vecor [0 0 1 0 0 1 0 0] and I want to create all possible vector that are a specified hamming distance from the original vector.
How can I do that?
(I was trying to create a mask that has only the specified number of bits as 1 and the rest is 0, but I wasn't able to get all the different such vectors)

  0 Comments

Sign in to comment.

1 Answer

Answer by John D'Errico
on 16 Feb 2019
Edited by John D'Errico
on 16 Feb 2019
 Accepted Answer

Why does this seem pretty easy, if you just think about what Hamming distance means?
In a binary vector, all that matters is the number of positions that are different. And in a binary ector, if a position is different at all, that just means that bit is flipped.
Given some binary vector, I'll call it B.
B = [0 0 1 0 0 1 0 0];
How many other vectors exist that lie at a Hamming distance of say 3? B has 8 elements. So if a different vector lies at aHamming distance of 3, then EXACTLY 3 elements have their bits flipped. So how many such vectors are there? nchoosek(8,3), which if my back of the envelope arithmetic is correct, is 56.
Those differences lie at positions
NB = numel(B);
NHD = 3;
NC = nchoosek(NB,NHD);
changes = nchoosek(1:NB,NHD)
1 2 3
1 2 4
1 2 5
...
which is an array with 56 rows, and 3 columns. Therefore, all vectors with a Hamming distance of exactly 3 can be written as
B3 = repmat(B,NC,1);
ind = sub2ind([NC,NB],repmat((1:NC)',1,NHD),changes);
B3(ind) = ~B3(ind);
So each row of B3 lies at a Hamming distance of 3 from B, and no such other vectors exist.
Note for long vectors, this computation may get EXTREMELY memory intensive, if the Hamming distance is at all close to half the length of B. For example, how many vectors lie at a Hamming distance of 50 from any binary vector of length 100?
nchoosek(100,50)
Warning: Result may not be exact. Coefficient is greater than 9.007199e+15 and is only accurate to 15 digits
> In nchoosek (line 92)
ans =
1.0089e+29
Too many to generate, that is how many.

  1 Comment

Thank you! Very detailed and good answer!

Sign in to comment.