Big Data tall array gathering a part of tall array does not work

4 views (last 30 days)
I have a huge set of data which I saved it as txt on my hard disk.
I wanted to do some calculations on the data which the tall array calculations does not support.
My solution was gathering a chunk of data each time, do the calculations and then print them to a txt file.
To get the chunk of data, I used a window of 10000 rows. For instatnce, I gather the data between rows 10000 to 20000 from the tall array, do the calculations and then save/print the data in another file. Here is an example of what I want to do
cost_temp = tall(something);
window_pan = 10000;
for i = 1:10
Temp = cost_temp((i-1) * window_pan+1 :(i) * window_pan,: );
[best_Cost, index_cost] = min(gather(Temp),[], 2);
end
The method works until row 70000. and after that I get this error
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 2: Completed in 6.1 sec
- Pass 2 of 2: Completed in 2.4 sec
Evaluation completed in 11 sec
Error using tall/gather>iAssertAdaptorMatches (line 126)
Internal problem while evaluating tall expression. The problem was:
An internal consistency error occurred. Details:
SIZE of output incorrect. Expected: [10000 NaN], actual: [1018 21].
Error in tall/gather>iGather (line 73)
cellfun(@iAssertAdaptorMatches, gatheredTalls, varargin(isArgTall));
Error in tall/gather (line 50)
[varargout{:}, readFailureSummary] = iGather(varargin{:});
Error in ...
Error in tall/gather (line 50)
[varargout{:}, readFailureSummary] = iGather(varargin{:});
Error in second_attempt (line 241)
[best_Cost, index_cost] = min(gather(Temp),[], 2);
Caused by:
Error using tall/gather>iAssertAdaptorMatches (line 126)
An internal consistency error occurred. Details:
SIZE of output incorrect. Expected: [10000 NaN], actual: [1018 21].
Interestinlgy, when I gather the whole data, there is no problem. However, I will need to apply this method to a larger data and I cannot gather that data.
Thanks!
  2 Comments
TOSA2016
TOSA2016 on 1 Oct 2019
Hi Guillaume,
The height of the array is 273000 (rows). I intentionally choise the max number of iterations in the for loop as 10 so this would not affect the question. Thanks for pointing it out.
Let me check the mapreduce to see if it helps.
Thanks for your response.

Sign in to comment.

Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by