repeat large number of vector elements

Hi, I have a vector storing only unique values: v = (0, 1, 2), and another vector about frequency of the unique values: c = (3, 2, 4). Now I want to create a new vector with repeated values as follows:
v2 = (0, 0, 0, 1, 1, 2, 2, 2, 2)
Since both vectors v and c are very large, how to efficiently derive v2 in matlab. Many thanks.

1 Kommentar

Jan
Jan am 20 Feb. 2013
Please specify "very large" explicitly: Some users tell 1000 elements large, others need billions for this term.

Melden Sie sich an, um zu kommentieren.

Antworten (5)

Jan
Jan am 20 Feb. 2013
Bearbeitet: Jan am 20 Feb. 2013

2 Stimmen

This is a run-length encoding. A standard decoding:
v = [0, 1, 2];
c = [3, 2, 4];
d = cumsum(c);
index = zeros(1, d(end));
index(d(1:end-1)+1) = 1;
index(1) = 1;
index = cumsum(index);
result = v(index);
Azzi Abdelmalek
Azzi Abdelmalek am 20 Feb. 2013
Bearbeitet: Azzi Abdelmalek am 21 Feb. 2013

0 Stimmen

v=[0 1 2];
c=[3,2,4];
out=cell2mat(arrayfun(@(x) repmat(v(x),1,c(x)),1:numel(v),'un',0))
EDIT
i1=0;
out=zeros(1,sum(c));
for k=1:numel(v)
i0=i1+1;
i1=i0+c(k)-1;
out(i0:i1)=v(k)*ones(1,c(k));
end

5 Kommentare

Youssef  Khmou
Youssef Khmou am 20 Feb. 2013
i did not realize the c well .
Azzi Abdelmalek
Azzi Abdelmalek am 20 Feb. 2013
Bearbeitet: Azzi Abdelmalek am 20 Feb. 2013
or less compact but much faster
i1=0;
out=zeros(1,sum(c));
for k=1:numel(v)
i0=i1+1;
i1=i0+c(k)-1;
out(i0:i1)=v(k)*ones(1,c(k));
end
Youssef  Khmou
Youssef Khmou am 20 Feb. 2013
ok
Azzi Abdelmalek
Azzi Abdelmalek am 20 Feb. 2013
It seems that ones(1,n)*v is faster then repmat
Youssef  Khmou
Youssef Khmou am 20 Feb. 2013
right, 0.0054s with repmat, and 0.0027s with ones(1,n)

Melden Sie sich an, um zu kommentieren.

Youssef  Khmou
Youssef Khmou am 20 Feb. 2013
Bearbeitet: Youssef Khmou am 20 Feb. 2013

0 Stimmen

for i=1:length(c)
V{i}=(i-1)*ones(c(i),1);
end
f=cell2mat(V(:))'
Jan
Jan am 21 Feb. 2013
Bearbeitet: Jan am 21 Feb. 2013

0 Stimmen

Some timings (R2009a/64/Win7/Core2Duo):
x = rand(1, 10000);
n = randi([0,10], 1, 10000);
tic; for k=1:100; r=runlength(x,n); end; toc
% With runlength() is a function with the following contents:
% Azzi's ARRAYFUN approach:
r = cell2mat(arrayfun(@(k) repmat(x(k), 1, n(k)), 1:numel(x), 'un', 0))
Elapsed time is 48.976947 seconds.
% Azzi's approach from the comment section: [EDITED start]
Elapsed time is 2.674559 seconds.
% Azzi's approach from the comment section with slight modification:
i1 = 0;
result = zeros(1, sum(c));
for k = 1:numel(v)
i0 = i1+1;
i1 = i0 + c(k) - 1;
result(i0:i1) = v(k); % Without "*ones(1,c(k))" !!
end
Elapsed time is 1.656987 seconds. [EDITED end]
% Youssef' approach:
V = cell(1, length(n));
for k = 1:length(n)
V{k} = x(k) * ones(n(k), 1); % With x(k) instead of (i-1)
end
r = cell2mat(V(:))'
Elapsed time is 1.422626 seconds
% The CUMSUM method from my former answer:
Elapsed time is 0.083854 seconds.
% An equivalent MEX implementation:
Elapsed time is 0.025775 seconds.
Conclusions:
  • ARRAYFUN and the anonymous function is not efficient compared to a simple loop. Therefore I suggest to avoid this combination strictly.
  • Collecting the single parts in a cell is a good idea. Then CELL2MAT can be replaced by the faster FEX: Cell2Vec : 0.97 sec
  • Creating the large temporary vectors cumsum(x) and index seem to be not as bad as I was afraid. The C-MEX is not a dramatic enhancement anymore.

2 Kommentare

Azzi Abdelmalek
Azzi Abdelmalek am 21 Feb. 2013
You missed the code hiden in the comment.
Jan
Jan am 21 Feb. 2013
Bearbeitet: Jan am 21 Feb. 2013
Yes, Azzi, I haven't seen it. It is much faster, especially when the multiplication with ONES is omitted. A good improvement compared to ARRAYFUN!
I do not understand why this is slower than collecting the partial vectors in a cell array at first. It looks much more efficient.

Melden Sie sich an, um zu kommentieren.

irina mihai
irina mihai am 6 Dez. 2015

0 Stimmen

Jan, Azzi Thanks for your help ! I used the cumsum function and the v(index) idea which i didn't know. I have a very large database (over 10^6 lines) and I was looking for something like this. Brilliant !

1 Kommentar

Jan
Jan am 6 Dez. 2015
Bearbeitet: Jan am 27 Dez. 2015
See repelem in modern Matlab versions and FEX: RunLength, which is about twice as fast as repelem.

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Loops and Conditional Statements finden Sie in Hilfe-Center und File Exchange

Tags

Gefragt:

am 20 Feb. 2013

Bearbeitet:

Jan
am 27 Dez. 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by