Problem 1983. Big data

Created by Alfonso Nieto-Castanon

Solve Later

Optimize this line of code:

 B = sum(gradient(corrcoef(A)).^2);

for a matrix A with size(A,2)>>size(A,1)

Description:

Analyses of large datasets often require some level of optimization for speed and/or memory usage. This is an example problem where both the initial data A and final measure of interest B fit perfectly well in memory, but the intermediate variables (in this case an impossibly large correlation matrix) required to compute the final measure of interest do not.

We have a large 2-dimensional matrix A (with dimensions 100 x 100,000).

We need to compute the following row vector B (with dimensions 1 x 100,000):

 B = sum(gradient(corrcoef(A)).^2);

This computes first the matrix of correlation coefficients for each pair of columns in A:

 a = corrcoef(A)

(a 100,000 by 100,000 matrix), then computes the spatial derivative of the resulting correlation matrix along the second dimension:

 b = gradient(a)

(another 100,000 by 100,000 matrix), and finally computes the squared norm of each column in the resulting matrix:

 B = sum(b.^2,1)

(a 100,000 element vector)

This straight-forward "vectorized" approach, nevertheless, fails because it requires too much memory (enough to store a 100,000 x 100,000 correlation matrix, around 80Gb).

We clearly need some form of simplification/optimization. Can you compute the measure B within the time-limit of a Cody solution? (typically about 30 seconds)

Solutions will be scored based on computation time (score equal to total time in seconds).

Context: (not relevant to solving this problem)

This problem arises in the analyses of fMRI datasets. A typical result from a fMRI scanner session is a 4-dimensional matrix A(x,y,z,t), where the first three dimensions are spatial dimensions (a scanner of the subject's head/brain) and the fourth dimension is temporal (sequential scans obtained during a typical fMRI session). Think of these as time-varying three-dimensional pictures of your brain activation. A lot of research in the past few years has focused on functional connectivity, a measure of the temporal correlation between the "activation" of any pair of brain areas. Several recent papers have investigated the possibility to obtain entirely data-driven parcellations of the brain (partitioning the brain into functionally-homogeneous areas) based on these spatial patterns of functional connectivity. The measure B above represents one of the measures that have been suggested as a way to drive these data-driven parcellations (borders between two adjacent but functionally distinct brain areas are expected to show higher spatial gradients in functional connectivity profiles). For simplicity I have collapsed the three spatial dimensions into one for this problem, but the computational complexity of the original computation is approximately the same (a typical scanner session results in something of the order of several hundred thousands "voxels" -three dimensional "pixels"- within the brain, and a few hundred time-points; this makes computing the entire "voxel-to-voxel" correlation matrix, or measures derived from it, rather challenging).

Solve

Solution Stats

39 Solutions
4 Solvers

Last Solution submitted on Nov 29, 2020

Last 200 Solutions

Problem Comments

3 Comments

3 Comments

Elmar Zander on 22 Mar 2017

HI Alfonso! There seems to be a problem in the test code. Says something like
"Error using evalin Undefined function or variable 'time_count'. Error in TestPoint4 (line 9) assignin('caller','time_count',[evalin('caller','time_count') t1]);"
from the 4th test on. Probably you have to store the time_count variable somewhere else...

yurenchu on 27 May 2017

Hi Alfonso! Tests 4, 5 and 6 in the test suite seem to be malfunctioning because of a problem with function 'evalin' and/or undefined variable 'time_count'. Could you please fix this?

Alfonso Nieto-Castanon on 26 Oct 2017

sorry about this (the error was a side effect of Cody changing to using a more general unit testing framework some time back). I have fixed the testsuite now

Solution Comments

Show comments

Problem Recent Solvers4

Problem Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Problem 1983. Big data

Solution Stats

Last 200 Solutions

Problem Comments

Solution Comments

Problem Recent Solvers4

Suggested Problems

More from this Author38

Problem Tags

Community Treasure Hunt

Problem 1983. Big data

Solution Stats

Last 200 Solutions

Problem Comments

Solution Comments

Problem Recent Solvers4

Suggested Problems

More from this Author38

Problem Tags

Community Treasure Hunt

Players