Filter löschen
Filter löschen

standard deviation takes for ever

4 Ansichten (letzte 30 Tage)
gujax
gujax am 12 Sep. 2023
Kommentiert: dpb am 13 Sep. 2023
I have a double precision numeric 3D matrix M (converted by fread from uint8) of size 30000 x 500 x 500 I would like to get standard deviation along dimension 2 tic, std(M,0,2) ; toc has taken more than 12 hours and still running meanwhile mean(M,2) only took 80 seconds.
Or a bit more details.. std(M(:,:,1),0,2) takes 0.3 seconds and std(M(:,:,1:100),0,2) takes 34 seconds But std(M(:,:,1:500),0,2) says out of memory
Similarly mean(M(:,:,1),2) takes 0.1 seconds But mean(M(:,:,1:500),2) does not work and gives me 'out of memory' message But mean(M,2) takes about 80 seconds. This is all very confusing! Thanks
  7 Kommentare
dpb
dpb am 12 Sep. 2023
Your original posting says "I have a double precision numeric 3D matrix M of size 30000 x 500 x 500..."
That's what I calculated above at 8 bytes/double takes up 59 GB storage.
I don't follow what " an accumulation of (500 x 100x 5) files each 31 KB in size." means?
Think you're going to have to show us specifically what your array is and how it was constructed.
gujax
gujax am 12 Sep. 2023
Bearbeitet: gujax am 13 Sep. 2023
Ah got it!
I append 100 x 500 x 500 times a 31 KB time series streaming data chunk into one file instead of generating 5 million separate write files.
So that’s about ~8GB data
But when I read it I didn’t quite realize by default fread converts it to double

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

gujax
gujax am 13 Sep. 2023
calculating statistical std takes more memory than calculating mean. If performing std on double formatted large data sets, it likely will slow down the computer if memory is limited. That may not be true for evaluating statistical mean.

Weitere Antworten (1)

Steven Lord
Steven Lord am 12 Sep. 2023
Can you confirm you're using the std function included in MATLAB? What does this command show?
which -all std
/MATLAB/toolbox/matlab/datafun/std.m /MATLAB/toolbox/matlab/datatypes/tabular/@tabular/std.m % tabular method /MATLAB/toolbox/matlab/datatypes/datetime/@datetime/std.m % datetime method /MATLAB/toolbox/matlab/datatypes/duration/@duration/std.m % duration method /MATLAB/toolbox/matlab/timeseries/@timeseries/std.m % timeseries method /MATLAB/toolbox/matlab/bigdata/@tall/std.m % tall method /MATLAB/toolbox/parallel/parallel/@distributed/std.m % distributed method
  9 Kommentare
gujax
gujax am 13 Sep. 2023
Bearbeitet: gujax am 13 Sep. 2023
I think I will state this issue resolved? i.e., calculating statistical std takes more memory than calculating mean. If performing std on double formatted large data sets, it likely will slow down the computer if memory is limited. That may not be true for evaluating statistical mean.
dpb
dpb am 13 Sep. 2023
The issue you're having must be in disk swapping owing to limited real memory...I'm still not positive about just how big your array is. How about
whos M
? to tell us precisely what you've processing and
memory
for the available memory your machine has?
It depends on how TMW builds the executable and what processor instructions they assume; unfortunately, it's likely they code to a "lower common denominator" of what is out there because know that not all customers are going to have latest CPU technology with enhanced vector processing instructions making use of builtin vector pipeline that exists with current processors.
I've never messed with trying it out, if you have a high-memory graphics card, you could possible try the GPU stuff...

Melden Sie sich an, um zu kommentieren.

Kategorien

Mehr zu Numeric Types finden Sie in Help Center und File Exchange

Produkte


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by