Decreasing Computational Time with Parfor and variable slicing
11 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Good day,
I am fairly new to parallel computing and so far I feel like I have been successful. However I have written a code in parallel with parfor and I am using a HUGE data set on the magnitude of 34000 x n. I was wondering if there is a way to make my computations even more efficient. I also have a message saying that variable is indexed but not sliced in a parfor loop. This might result in unnecessary communication overhead. Here is a copy of my code
softTFIDFMat = zeros(n,n);
parfor i=1:n
temp = zeros(1,n);
for j=i:n
score = tfidfn(i,:)'*tfidfn(j,:).*jMat;
score = sum(score(:));
temp(j) = score;
end
softTFIDFMat(i,:) = temp;
end
tfidfn is a sparse matrix that is 34303 x n, where n is generally > 2000 and jMat is also a sparse double. Any help would be appreciated. Computational time is a little under 24 hours as of now.
0 Kommentare
Akzeptierte Antwort
Jan
am 17 Apr. 2013
tfidfn(i,:)'*tfidfn(j,:)
This consumes much more time that a column oriented indexing:
tfidfnT = transpose(tfidfn);
...
score = tfidfnT(:, i) * tfidfnT(:, j)' .* jMat;
3 Kommentare
Adam Filion
am 17 Apr. 2013
Bearbeitet: Adam Filion
am 17 Apr. 2013
It has to do with how MATLAB stores data in memory. For a matrix, it stores it column-wise, meaning that a matrix like
1 2 3
4 5 6
Is stored in memory as
1
4
2
5
3
6
So when you grab a column like tfidfnt(:,i), that is a contiguous chunk in memory. A row, like tfidfnt(i,:), is non-contiguous, which is more time consuming to work with particularly for larger data sets.
EDIT
I didn't notice that your data was sparse. Are you using the sparse data type? If you are then I'm not sure why it would make a difference, as I believe the sparse data type is stored differently than a normal matrix.
Weitere Antworten (2)
Edric Ellis
am 17 Apr. 2013
It looks as though each iteration of your PARFOR loop accesses every element of "tfidfn", so you cannot slice it. Even if "tfidfn" were dense, it's still "only" about 0.5GB, and so the transfer time for that to each worker is very likely to be completely insignificant compared to 24 hours for the complete computation.
Siehe auch
Kategorien
Mehr zu Matrix Indexing finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!