Finding if a vector is a subset

Asked by Harel Harel Shattenstein

Harel Harel Shattenstein (view profile)

on 10 Apr 2018
Latest activity Commented on by Roger Stafford

Roger Stafford (view profile)

on 10 Apr 2018
Accepted Answer by Roger Stafford

Roger Stafford (view profile)

I am trying to build a function that for
a=[1 2] b=[1 3 2 9 5]
will return false
and for
a=[1 2] b=[1 2 2 9 5]
return true
What I manage to do is
function[yn] = subset1(v1,v2)
yn=0;
n=length(v1);
m=length(v2);
v=[];
if n<=m
for i=1:n
for j=1:(m-n+1)
while (v1(i)==v2(j))
v(end+1)=v1(i);
i=i+1;
j=j+1;
end
end
end
end
if length(find(v))==length(find(v1)) && find(v)==find(v1)
yn=1;
end
if n>m
for i=1:m
for j=1:(n-m+1)
while ([v2(i)]==v1(j))
v(end+1)=v2(i);
i=i+1;
j=j+1;
end
end
end
end
if length(find(v))==length(find(v2)) && find(v)==find(v2)
yn=1;
end
but it does not work in the first case

David Fletcher

David Fletcher (view profile)

on 10 Apr 2018
There might be some mileage in investigating if existing string comparison functions will do what you need with a bit less effort.
a=[1 2]
b=[1 3 2 9 5]
c=[1 2 2 9 5]
strfind(num2str(b),num2str(a))
strfind(num2str(c),num2str(a))
Walter Roberson

Walter Roberson (view profile)

on 10 Apr 2018
The num2str() turns out not to be needed.

Tags

Answer by Roger Stafford

Roger Stafford (view profile)

on 10 Apr 2018

The following should be faster:
m = size(a,2);
n = size(b,2);
for k = 1:n-m+1
s = all(a==b(k:k+m-1));
if s, break, end
end
Logical s will be true if any m-length section of b is equal to the a vector.

Steven Lord

Steven Lord (view profile)

on 10 Apr 2018
I haven't tried this to see if it's faster, but find-ing the first element of a in b and iterating over only those starting points using the technique your code uses may help. I suspect adding that initial search would be particularly useful if a(1) is relatively rare in b.
Roger Stafford

Roger Stafford (view profile)

on 10 Apr 2018
My "faster" claim was in reference to Shattenstein's code.

Rik (view profile)

on 10 Apr 2018

strfind should be an option, especially if you only have positive integer scalars, which you can just cast to char. Otherwise, the solution below might also be an option. It might not scale really well to huge vectors due to that convolution, but that is done on a binary matrix, so that should be as fast as it can be.
Another note: this uses implicit expansion, so if you don't have R2016b or newer, you'll have to use bsxfun.
a=[1 2];b1=[1 3 2 9 5];b2=[1 2 2 9 5];
%requires implicit expansion (use bsxfun on R2016a and earlier)
HasMatch=@(a,b) any(any(conv2(b'==a,logical(eye(length(a))),'same')==length(a)));
HasMatch(a,b1)
HasMatch(a,b2)

Walter Roberson

Walter Roberson (view profile)

on 10 Apr 2018
>> a=[1 2]; b=[1 3 2 9 5];
>> strfind(b,a)
ans =
[]
>> a=[1 2], b=[1 2 2 9 5]
a =
1 2
b =
1 2 2 9 5
>> strfind(b,a)
ans =
1
This is not a documented use for strfind() but it has worked for quite some time.
You do not need to convert to char: it is happy to search on char, integer-valued doubles, logical, even floating point numbers -- but do note that it looks for bitwise exact matches, not tolerances at all.
The value returned is the indices of the matches, so you can test isempty() to see if there was a match.
Walter Roberson

Walter Roberson (view profile)

on 10 Apr 2018
Oh yes: the one restriction here is that strfind() will only work with row vectors, not with column vectors.