Big Data tall array gathering a part of tall array does not work

2 Ansichten (letzte 30 Tage)
TOSA2016
TOSA2016 am 1 Okt. 2019
Kommentiert: TOSA2016 am 1 Okt. 2019
I have a huge set of data which I saved it as txt on my hard disk.
I wanted to do some calculations on the data which the tall array calculations does not support.
My solution was gathering a chunk of data each time, do the calculations and then print them to a txt file.
To get the chunk of data, I used a window of 10000 rows. For instatnce, I gather the data between rows 10000 to 20000 from the tall array, do the calculations and then save/print the data in another file. Here is an example of what I want to do
cost_temp = tall(something);
window_pan = 10000;
for i = 1:10
Temp = cost_temp((i-1) * window_pan+1 :(i) * window_pan,: );
[best_Cost, index_cost] = min(gather(Temp),[], 2);
end
The method works until row 70000. and after that I get this error
Evaluating tall expression using the Parallel Pool 'local':
- Pass 1 of 2: Completed in 6.1 sec
- Pass 2 of 2: Completed in 2.4 sec
Evaluation completed in 11 sec
Error using tall/gather>iAssertAdaptorMatches (line 126)
Internal problem while evaluating tall expression. The problem was:
An internal consistency error occurred. Details:
SIZE of output incorrect. Expected: [10000 NaN], actual: [1018 21].
Error in tall/gather>iGather (line 73)
cellfun(@iAssertAdaptorMatches, gatheredTalls, varargin(isArgTall));
Error in tall/gather (line 50)
[varargout{:}, readFailureSummary] = iGather(varargin{:});
Error in ...
Error in tall/gather (line 50)
[varargout{:}, readFailureSummary] = iGather(varargin{:});
Error in second_attempt (line 241)
[best_Cost, index_cost] = min(gather(Temp),[], 2);
Caused by:
Error using tall/gather>iAssertAdaptorMatches (line 126)
An internal consistency error occurred. Details:
SIZE of output incorrect. Expected: [10000 NaN], actual: [1018 21].
Interestinlgy, when I gather the whole data, there is no problem. However, I will need to apply this method to a larger data and I cannot gather that data.
Thanks!
  2 Kommentare
Guillaume
Guillaume am 1 Okt. 2019
Bearbeitet: Guillaume am 1 Okt. 2019
What is the height of the array? Does the error occur on the last window which may not have a height of 10000? At present your code will only work if the number of rows in the whole file is exactly a multiple of window_pan.
Maybe your workflow is more suited for mapreduce?
TOSA2016
TOSA2016 am 1 Okt. 2019
Hi Guillaume,
The height of the array is 273000 (rows). I intentionally choise the max number of iterations in the for loop as 10 so this would not affect the question. Thanks for pointing it out.
Let me check the mapreduce to see if it helps.
Thanks for your response.

Melden Sie sich an, um zu kommentieren.

Antworten (0)

Kategorien

Mehr zu Large Files and Big Data finden Sie in Help Center und File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by