Comparing each vector in a cell array to another vector

8 Ansichten (letzte 30 Tage)
Christopher Tran
Christopher Tran am 5 Feb. 2017
Bearbeitet: Stephen23 am 5 Feb. 2017
Hello. I am wondering if there is an easier way to check vectors in my cell array are subsets of another given vector. I want to keep track of how many times the vectors in C are found in T, for many different T's. For example:
T = [1 2 3 4 5 6]
C = { [1 2], [3 4], [5,6], [7 8], [9, 10] }
% loop through each candidate
for j = 1:numCands
c = Ck{j};
% Candidate Counts. Check if candidate is a subset of the
% current transaction
if ismember(c,t)
candCounts(j) = candCounts(j) + 1;
end
% Candidate Tail Counts. Check if the tail of the candidate is
% a subset of the current transaction
if ismember(c(2:end),t)
candTailCounts(j) = candTailCounts(j) + 1;
end
end
What I have done is loop through each member of C and check using ismember() for each element. But this gets really long when C becomes very large. I was wondering if there was any clever way of vectorization which would result in a faster runtime. I would also like to add that T can be varying sized vectors. Not sure if that information is relevant or not.
Thanks in advance!
Edit: For future references here is what I ended up doing, which is similar to what the answers suggested:
% Make all candidates into matrix for faster calculation
allCands = vertcat(Ck{:});
% Get number of columns to get where there are subsets.
[~,col] = size(allCands);
% Find locations where the candidates are subsets of the
% transactions for both the candidates and tail counts.
memb1 = ismember(allCands,t);
memb2 = ismember(allCands(:,2:end),t);
% Subset check
subCheck1 = find(sum(memb1,2) == col);
subCheck2 = find(sum(memb2,2) == col - 1);
% Increment counts
candCounts(subCheck1) = candCounts(subCheck1) + 1;
candTailCounts(subCheck2) = candTailCounts(subCheck2) + 1;
  1 Kommentar
Walter Roberson
Walter Roberson am 5 Feb. 2017
Bearbeitet: Walter Roberson am 5 Feb. 2017
Does it happen to be the case that C and T only have integer elements in the range 0 to 65535 ?
Are all of the entries in C the same length? Are they always row vectors?

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Stephen23
Stephen23 am 5 Feb. 2017
Bearbeitet: Stephen23 am 5 Feb. 2017
Method one: one ismember call: the function ismember is going to to be slowest operation, so one simple speedup is to reduce how many times you call it: Note that a vector c will only be a member if the tail c(2:end) is a member, so you can test for the tail first, and only if the tail is a member do you need to check if the first element is also a member:
T = [1,2,3,4,5,6];
C = {[1,2],[3,4],[5,6],[7,8],[9,10]};
cntC = 0;
cntT = 0;
for k = 1:numel(C)
if ismember(C{k}(2:end),T)
cntT = cntT+1;
cntC = cntC+any(C{k}(1)==T);
end
end
Method two: Convert to a matrix: if all members of C are the same size vectors, then convert C to a matrix and perform one simple comparison using bsxfun:
M = vertcat(C{:});
U = reshape(T,1,1,[]);
X = any(bsxfun(@eq,M,U),3);
xT = all(X(:,2:end),2);
xC = xT&X(:,1);
cntT = sum(xT)
cntC = sum(xC)
  5 Kommentare
Stephen23
Stephen23 am 5 Feb. 2017
"The problem with that is if the elements occur out of order, you would still get a match."
That is what a subset is, and the original question specifically asks for "check vectors in my cell array are subsets of another given vector".
For a subset the order is irrelevant.
Stephen23
Stephen23 am 5 Feb. 2017
Bearbeitet: Stephen23 am 5 Feb. 2017
@Christopher Tran: I am happy to take a look. Please clarify one point: do you want to match the values in the vectors in the same order, or is the order irrelevant (as per set theory and the definition of subset). For example, is [2,1] a subset of [1,2,3] or not?

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Walter Roberson
Walter Roberson am 5 Feb. 2017
If all members of C are the same length then you can use buffer() with overlap to create an array containing all the subsequences of that length, and transpose that and then convert C to a row matrix and then ismember() with rows option
  1 Kommentar
Christopher Tran
Christopher Tran am 5 Feb. 2017
Bearbeitet: Christopher Tran am 5 Feb. 2017
Hi Walter, Thanks for the response! The elements in C are indeed the same sized vectors.
I am not sure what is buffer(), I may take a look at it, but I ended up employing a similar idea to yours using vertcat() (similar to how Stephen did), but used sum(), and find() to get the indices where C is a subset of T (note I copied and pasted my code, so all 't's, are 'T's from my above example. Would it be necessary to use the sum() or can i just use the result from ismember()? ismember() would produce a vector of logicals and using the sum() made more sense to me, but I'm not sure. Also I'm not sure how much of an effect it would be on the speed, and also compared to using buffer().
% Make all candidates into matrix for faster calculation
allCands = vertcat(Ck{:});
% Get number of columns to get where there are subsets.
[~,col] = size(allCands);
% Find locations where the candidates are subsets of the
% transactions for both the candidates and tail counts.
memb1 = ismember(allCands,t);
memb2 = ismember(allCands(:,2:end),t);
% Subset check
subCheck1 = find(sum(memb1,2) == col);
subCheck2 = find(sum(memb2,2) == col - 1);
% Increment counts
candCounts(subCheck1) = candCounts(subCheck1) + 1;
candTailCounts(subCheck2) = candTailCounts(subCheck2) + 1;

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Operators and Elementary Operations finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by