I am trying to find vectorized matlab function ind = item2ind(item,t) to solve following problem: I have a list of row vectors
item = [2 3 1; 2 1 2; 3 1 1; 1 3 3]
and vector of all possible item elements at each item row vector
t = [1 1 2 2 2 3 3];
I need to find indexes of of separate item rows elements corresponding to the vector t in this way:
ind = [3 6 1; 3 1 4; 6 1 2; 1 6 7]
But
item = [1 1 1]
does not correspond to the vector t, because there are 3 "1" elements, and t contains only 2 "1" elements.
Note: Serial version is inefficient for large item (10000 x 100) and t (1 x 200).
function ind = item2ind(item,t)
[nlp,N] = size(item);
ind = zeros(nlp,N);
for i = 1:nlp
auxitem = item(i,:);
auxt = t;
for j = 1:N
I = find(auxitem(j) == auxt,1,'first');
if ~isempty(I)
auxt(I) = 0;
ind(i,j) = I;
else
error('Incompatible content of item and t.');
end
end
end
end
Additional remarks:
Most of the time is spent on the line:
I = find(auxitem(j) == auxt,1,'first');
Is there any clever trick how to speed up this line of code? I tried this, for example, but without any speedup:
I = ipos(auxitem(j) == auxt); I = I(1);
where ipos is preallocated as:
ipos = 1:length(t);
Thanks in advance for any help ...

6 Kommentare

Wick
Wick am 9 Mai 2018
Bearbeitet: Wick am 9 Mai 2018
And what should be the result of item = [1 1 1], ind = [1 2 NaN]?
And that should be on a per-row basis?
Is 't' always sorted?
Edit: I see that you're failing out of the subroutine answers my first two questions. So then we just need to know if 't' is sorted.
Michal
Michal am 9 Mai 2018
Bearbeitet: Michal am 9 Mai 2018
No, in general, vector t is not sorted!
Edit: I just fix typo in my function.
Wick
Wick am 9 Mai 2018
Bearbeitet: Wick am 9 Mai 2018
Well, that's going to make a mess when you want to keep track of indices - but I think we can manage.
I've got something that should loop through the unique elements of t to do what you want. Now I've got to figure out the not-sorted issue.
Edit: No worries, 200 extra 'find' commands on 't' will only add ~2 milliseconds.
Michal
Michal am 9 Mai 2018
My concern is about "item" number of rows, which could be very large (up to 1e6).
Jan
Jan am 9 Mai 2018
Bearbeitet: Jan am 9 Mai 2018
@Michal: If you provide a relevant set of example data, we could test the speed and the correctness of the suggested code.
What is the maximum value of t?
Do you have a C compiler installed?
Michal
Michal am 9 Mai 2018
Bearbeitet: Michal am 9 Mai 2018
Please read discussion under Chad's answer.
Typical length of "t" is 30-100, maxmimum value of elements at "t" is equal maximum value of elements at array "item".
Yes I have a C compiler...
Please, keep in mind, that vector "t" is not sorted.

Melden Sie sich an, um zu kommentieren.

 Akzeptierte Antwort

Michal
Michal am 11 Mai 2018
Bearbeitet: Michal am 11 Mai 2018

0 Stimmen

So far best solution:
function ind = item2ind_new(item,t)
t = t(:);
[m,n] = size(item);
mct = max(accumarray(t,1));
G = accumarray(t,1:length(t),[],@(x) {sort(x)});
G = cellfun(@(x) padarray(x.',[0 mct-length(x)],0,'post'), G, 'UniformOutput', false);
G = vertcat(G{:});
C = cumsum(reshape(item,m,1,n)==item,3);
ia = C(sub2ind(size(C),repelem((1:m).',1,n),repelem(1:n,m,1),repelem(1:n,m,1)));
ind = G(sub2ind(size(G),item,ia));
Any idea how to improve it?

Weitere Antworten (2)

Jan
Jan am 9 Mai 2018
Bearbeitet: Jan am 9 Mai 2018

1 Stimme

function ind = item2ind(item, t);
maxRun = length(t) + 1;
[T , TI] = accumsort(t, maxRun);
ind = zeros(size(item));
for k = 1:size(item, 1)
[aItem, aItemI] = accumsort(item(k, :), maxRun);
% [m, index] = ismember(aItem, T);
% Faster with undocumented function:
[m, index] = builtin('_ismemberhelper', aItem, T);
if all(m)
ind(k, aItemI) = TI(index);
else
error('Incompatible item.');
end
end
end
function [T, TI] = accumsort(t, maxRun)
[sortedT, TI] = sort(t);
T = sortedT * maxRun;
c = -1;
for k = 1:numel(T)
if T(k) ~= c
d = 0;
c = T(k);
else
d = d + 1;
end
T(k) = T(k) + d;
end
end
For some test data of size [10'000 x 100] I get a runtime of 0.21 sec instead of 1.3 sec of the original version.
With calling ismember the runtime is 0.41 sec. Internally ismember calls the helper function builtin('_ismemberhelper') for sorted data of type double. If it is known already, that the input is sorted, calling the internal function avoids the overhead.
If you have a C compiler, converting accumsort to a C-mex would be useful.
maxRun must a any number greater than the highest number of repetitions in t. length(t)+1 is guaranteed to be larger.

9 Kommentare

Michal
Michal am 9 Mai 2018
Bearbeitet: Michal am 9 Mai 2018
I tried to convert accumsort to C-mex (MATLAB Coder), but the mex file shows very poor performance (2x slower than MATLAB version). Actually, I am not familiar with manual C-mex file writing...
But the main problem is trade of between number of unique elements at vector "t" and overall performance. Chad's algorithm is significantly faster than yours in a case of low number unique elements of "t".
This is Chad's final code, for your info:
function ind = item2ind(item,t)
unique_t = unique(t);
ind = zeros(size(item));
% a single 'for' loop as long as the unique elements of t
for jj = 1:length(unique_t)
O = zeros(size(item));
O(item == unique_t(jj)) = 1;
positions_of_t = [0 find(t == unique_t(jj))];
% adding zero so sub_index call below will always reference a non-zero element
if max(sum(O,2)) > numel(positions_of_t)
error('Incompatible content of item and t.');
else
sub_index = cumsum(O,2) .* O + 1;
ind = ind + positions_of_t(sub_index);
% this is why we needed the 0 in positions_of_t above
end
end
Case1:
t = 1:30;
tt = repmat(t,1,1);
[~,p] = sort(rand(100000,30),2);
item = tt(p);
tic;ind1 = item2ind_Chad(item,tt);toc
tic;ind2 = item2ind_Jan(item,tt);toc
Elapsed time is 2.363570 seconds.
Elapsed time is 1.420776 seconds.
Case2:
t = 1:10;
tt = repmat(t,1,3);
[~,p] = sort(rand(100000,30),2);
item = tt(p);
tic;ind1 = item2ind_Chad(item,tt);toc
tic;ind2 = item2ind_Jan(item,tt);toc
Elapsed time is 0.804354 seconds.
Elapsed time is 1.414612 seconds.
Jan
Jan am 9 Mai 2018
You have at least a speed up of a factor 6.2 . I hope you are happy about this already. I try to create a C-Mex in addition. Wait a little bit...
Michal
Michal am 9 Mai 2018
My original code is very simple to parallelize by single change for-loop to parfor-loop. On my 12 core desktop PC is final speedup factor by parfor-loop about 10.
But, of course, you code is far more clever … :)
Jan
Jan am 9 Mai 2018
As far as I can see, my code can be parallelized also directly:
parfor k = 1:size(item, 1)
I've made a very interesting experiment with a C-Mex function:
// file: accumsortX.c
// mex -O accumsortX.c
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {
double *t, *T, maxT, c, d;
size_t n;
t = mxGetPr(prhs[0]);
maxT = mxGetScalar(prhs[1]);
n = mxGetNumberOfElements(prhs[0]);
plhs[0] = mxCreateUninitNumericMatrix(1, n, mxDOUBLE_CLASS, mxREAL);
T = mxGetPr(plhs[0]);
c = *t - 1;
while (n-- != 0) {
if (c != *t) {
d = 0.0;
c = *t;
} else {
d += 1.0;
}
*T++ = maxT * *t++ + d;
}
return;
}
It has exactly the same speed as the Matlab version of accumsort - but I've moved the sorting to the main function:
function ind = item2ind_jan2(item,t)
maxT = max(t) + 1;
[T, TI] = sort(t);
T = accumsort(T, maxT);
ind = zeros(size(item));
for k = 1:size(item, 1)
[aItem, aItemI] = sort(item(k, :));
aItem = accumsort(aItem, maxT);
[m, index] = builtin('_ismemberhelper', aItem, T);
if all(m)
ind(k, aItemI) = TI(index);
else
error('item not compatible to t.');
end
end
end
function [T] = accumsort(t, maxT)
T = t * maxT;
c = -1;
for k = 1:numel(T)
if T(k) ~= c
d = 0;
c = T(k);
else
d = d + 1;
end
T(k) = T(k) + d;
end
end
Replacing accumsort() by accumsortX() does not change the run-time, even if I do not create a new vector in the Mex but write to the original array.
But let me repeat: I still have no idea how you measure the timings, because for your given input data:
t = 1:10;
tt = repmat(t,1,3);
[~,p] = sort(rand(100000,30),2);
item = tt(p);
I get the error message for incompatible items. Please clarify this detail, because I'm eager to reproduce your timings in any way.
Michal
Michal am 9 Mai 2018
Bearbeitet: Michal am 9 Mai 2018
Again very strange …
1. When I change for to parfor in your code I can not run this code in parallel, because oft his error: Valid indices for 'ind' are restricted n PARFOR loops on this line of your code:
ind(k, aItemI) = TI(index);
2. Oh yes...:), sorry, you must call the function
ind = item2ind(item,tt)
not
ind = item2ind(item,t)!!!
This was mentioned in above discussion here, see Case1 and Case 2.
'tt' or 'tbig' plays a role of 't' now, to simulate multiple non-unique value elements. That is all!!!
Wick
Wick am 10 Mai 2018
Michal, what sort of computer are you running on? I've got a Threadripper with more cores and you're running circles around me on your timings. I had heard that the math libraries aren't optimized for AMD processors and clearly that appears to be the case.
Michal
Michal am 10 Mai 2018
Intel® Core™ i7-6800K Processor + X99 Asus motherboard + 64GB RAM + 240GB SSD Intel Pro disk (system) + 2TB HDD (Data) + GTX TITAN (CUDA) GPU
Ubuntu Linux 16.04 (64bit)
Wick
Wick am 10 Mai 2018
Interesting. Jan's code is always faster on my computer than the numbers you gave. My code is faster on my machine for the short runs but for the long runs your machine is faster. It appears your memory subsystem is a bit more efficient.
Michal
Michal am 10 Mai 2018
@Jan any progress or new ideas on your site?

Melden Sie sich an, um zu kommentieren.

Wick
Wick am 9 Mai 2018
Bearbeitet: Wick am 9 Mai 2018

0 Stimmen

Here you go. At the sizes you suggested, this shouldn't take too long. It has a single 'for' loop that cycles through the unique values of 't'.
I'm using logical indexing to identify all the elements in 'item' that match the given unique 't' and summing across the row. If the sum exceeds the number of times that value showed up in 't' you get your error. Otherwise I'm using 'cumsum' in a creative fashion (in my ever so humble opinion) to provide the indexes back to the location of the unique value in the original vector 't'.
Good Luck!
function ind = item2ind(item,t)
unique_t = unique(t);
ind = zeros(size(item));
try
% a single 'for' loop as long as the unique elements of t
for jj = 1:length(unique_t)
O = zeros(size(item));
O(item == unique_t(jj)) = 1;
positions_of_t = [0 find(t == unique_t(jj))];
% adding zero so sub_index call below will always reference a non-zero element
sub_index = cumsum(O,2) .* O + 1;
ind = ind + positions_of_t(sub_index);
% this is why we needed the 0 in positions_of_t above
end
catch
error('Incompatible content of item and t.');
end

12 Kommentare

Michal
Michal am 9 Mai 2018
Your vectorized code with loop over unique values of "t" is definitely better (faster), even in a case of simple change for-loop to parfor-loop in my function. But I need some time to understand your code in all details :)
Thank a lot ...
Wick
Wick am 9 Mai 2018
Test case for this code. I moved 't' to be out of order.
items =
2 3 1
2 1 2
3 1 1
1 3 3
t =
1 2 2 1 2 3 3
ind =
2 6 1
2 1 3
6 1 4
1 6 7
In terms of your concern that 'item' may get large, that shouldn't slow this down particularly much. In the end you've got to consider a logical index of every 't' to every 'item'. We managed to keep that all vectorized and the only 'for' loop is through the elements of (unique(t)).
Wick
Wick am 9 Mai 2018
I changed my code so it's a bit faster. I got rid of the if/then statement each loop iteration. If there are too many elements to be assigned to the values in 'positions_of_t' the code is already going to error. In which case, we just let it error. However, since you had a specific error message in mind, the try/catch/end structure is wrapped around the whole thing.
That should save several hundred calls to if/then.
Michal
Michal am 9 Mai 2018
Bearbeitet: Michal am 9 Mai 2018
I just found situation where is your code significantly slower than my original code with parfor-loop (do you have parallel toolbox???).
Example test case:
t = 1:50; % template for vector "t" (50 unique elements)
tbig = repmat(t,1,5); % creation of large "t" -> "tbig"
[~,p] = sort(rand(100000,length(tbig)),2); % creation of 1e5 permutations with 250 elements
item = tbig(p); % creation of "item"
tic;ind1 = item2ind_mycode(item,tbig);toc
tic;ind2 = item2ind_yourcode(item,tbig);toc
isequal(ind1,ind2)
Results:
mycode: Elapsed time is 3.956899 seconds.
yourcode: Elapsed time is 18.272204 seconds.
Any idea how to sped up this specific case (clear trade-off between number of unique elements of vector "t" and overall performance)???
Wick
Wick am 9 Mai 2018
Bearbeitet: Wick am 9 Mai 2018
I don't have the parallel toolbox and you have a fast computer. :) That took me 28 seconds.
It surprises me it's that slow. Let me play.
Edit: The trouble is the memory time spent assigning the values of positions_of_t. The following code is slower on my system but the loop should now be parallelizable. Therefore, you can try this with the parallel toolbox loop and see if it speeds things up.
function ind = item2ind(item,t)
unique_t = unique(t);
ind = zeros([size(item) numel(unique_t)]);
try
% a single 'for' loop as long as the unique elements of t
for jj = 1:length(unique_t)
O = zeros(size(item));
O(item == unique_t(jj)) = 1;
positions_of_t = [0 find(t == unique_t(jj))];
% adding zero so sub_index call below will always reference a non-zero element
sub_index = cumsum(O,2) .* O + 1;
ind(:,:,jj) = positions_of_t(sub_index);
% this is why we needed the 0 in positions_of_t above
end
catch
error('Incompatible content of item and t.');
end
ind = sum(ind,3);
Michal
Michal am 9 Mai 2018
Bearbeitet: Michal am 9 Mai 2018
Profiler shows the following bottleneck lines in your code:
57% ... ind = ind + positions_of_t(sub_index);
27% ... O(item == unique_t(jj)) = 1;
13% ... sub_index = cumsum(O,2) .* O + 1;
This typical vectorization performance problem started after significant JIT engine upgrade (> R2015 ?), when serial codes are much more faster in some specific cases than its vectorized alternatives.
Michal
Michal am 9 Mai 2018
Bearbeitet: Michal am 9 Mai 2018
Latest version of your code is a bit slower ... :(
mycode: Elapsed time is 3.652306 seconds.
yourcode (for-loop): Elapsed time is 22.581060 seconds.
yopurcode(parfor-loop): Elapsed time is 20.314874 seconds.
Michal
Michal am 9 Mai 2018
Bearbeitet: Michal am 9 Mai 2018
Just for your info, slightly diferent problem:
t = 1:10; % template for vector "t" (10 unique elements)
tbig = repmat(t,1,25); % creation of large "t" -> "tbig"
[~,p] = sort(rand(100000,length(tbig)),2); % creation of 1e5 permutations with 250 elements
item = tbig(p); % creation of "item"
tic;ind1 = item2ind_mycode(item,tbig);toc
tic;ind2 = item2ind_yourcode(item,tbig);toc
Produce following results:
mycode(parfor):Elapsed time is 3.627591 seconds.
yourcode(parfor): Elapsed time is 5.047966 seconds.
yourcode(for): Elapsed time is 4.782384 seconds.
Wick
Wick am 9 Mai 2018
Sorry, I don't think I'm of any further help. I'm pretty good at vectorizing code, but it appears in this case that leaving things as nested 'for' loops is the better choice.
I think the trouble is the non-sequential memory writes associated with the index assignments. MATLAB's just got too much overhead on those if the compiled 'for' loops are able to run faster.
Jan
Jan am 9 Mai 2018
For your test data:
t = 1:50;
tbig = repmat(t,1,5);
[~,p] = sort(rand(100000,length(tbig)),2);
item = tbig(p);
Your code from the question stops with the error "Incompatible content of item and t." So how can you measure the timings?!
The same for:
t = 1:10;
tbig = repmat(t,1,25);
[~,p] = sort(rand(100000,length(tbig)),2);
item = tbig(p);
I tried it with:
t = repmat(1:50, 1, 5);
t = t(randperm(length(t)));
item = zeros(10000, 100);
for k = 1:10000; item(k, :) = t(randperm(length(t), 100)); end
This let your (and my) item2ind() run successfully.
Do I miss a point?!
Michal
Michal am 9 Mai 2018
This is strange, my code perfectly works with both data case examples you mentioned above … ??!!
Wick
Wick am 9 Mai 2018
Jan,
My code is faster for small length 't' and much, much slower for large 't'. You vectorized in a completely different way than I did (and used an undocumented function but we won't use that against you). My question is, is there some rule of thumb my snippet of code didn't follow that I should change how I code things? I've always felt I was pretty good at vectorizing my MATLAB code but I've been coming here to learn how to be better. Obviously, you know some tricks I don't.
Thanks.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Parallel Computing Toolbox finden Sie in Hilfe-Center und File Exchange

Produkte

Gefragt:

am 9 Mai 2018

Bearbeitet:

am 11 Mai 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by