Fastest possible code for AUC between a continuous predictor and a binary target

Hi Folks,
I am on the hunt for the fastest possible Matlab code that computes empirical exact Area Under the Receiver/Operator Curve (AUC). The built-in Matlab's perfcurve returns auc as the 4th output but it is terribly slow. Here is my current fastest code, would apprreciate suggestions if/how to make it faster. cumsum(flipud(x)) appear optimizable, but I could not find any faster solution so far. Another lead could be that all variables involved are integer until the very last scaling normalization /(2*s0*s1).
%AUC - Area Under the Receiver/Operator Curve for binary target vector y
% and a continuous or binary predictor vector x
function a=auc(y,x)
n=numel(x); s1=sum(y); s0=n-s1;
if islogical(x) %Binary x case a = (1+tp-fp)/2
tp=sum(x & y); fp=(sum(x)-tp)/s0; tp=tp/s1; a=(1+tp-fp)/2;
else %Continuous numerical x case
[x,j]=sort(x);
i=[find(diff(x)~=0);n]; c=diff([0;i]);
y=cumsum((y(j,:))); y=diff([0;y(i,:)]);
tp=[0;cumsum(flipud(y))]; fp=[0;cumsum(flipud(c))]-tp;
a=sum(diff(fp).*(tp(1:end-1,:)+tp(2:end,:)))/(2*s0*s1);
end
a=max(a,1-a);
A quick speed test to beat (on a small portable laptop):
x=rand(1e7,1); y=rand(1e7,1)>.5; tic;a=auc(y,x);t=toc; [a t]
ans =
0.5001 1.7108

Antworten (1)

nnz() on a logical matrix is faster than sum() on the matrix
foo = rand(1,1e7) < 0.1;
N = 50;
t1 = zeros(N,1);
t2 = zeros(N,1);
for K = 1 : N; t1(K) = timeit(@() sum(foo), 0); end
for K = 1 : N; t2(K) = timeit(@() nnz(foo), 0); end
plot([t1, t2])
legend({'sum', 'nnz'})

3 Kommentare

Thanks Walter, thats cool, didn't know this function. Though, it will speed up mostly the binary predictor case that is anyway very fast. Here is the profiler breakdown of the code for 1e7-long inputs:
The second section has no useful comments. I do not know what it is intended to do or why it is doing things that way.
Sorry, I did not expect to target the merit here but here you go:
%AUC - Area Under the Receiver/Operator Curve for binary target vector y
% and a continuous or binary predictor vector x
function a=auc1(y,x)
n=numel(x); s1=nnz(y); s0=n-s1;
if islogical(x) %Binary x case a = (1+tp-fp)/2
tp=nnz(x & y); fp=(nnz(x)-tp)/s0; tp=tp/s1; a=(1+tp-fp)/2;
else %Continuous numerical x case
[x,j]=sort(x); %Sort predictor values x and get sorted index j
i=[find(diff(x)~=0);n]); %Find indices of the last unique values of sorted x
c=diff([0;i]); %Compute the frequency (count) of all unique values of x
y=cumsum((y(j,:))); %Compute cumulative sum of targets along the order of sorted x
y=diff([0;y(i,:)]); %Compute the number of positive targets observed for each unique x
tp=[0;cumsum(flipud(y))]; %Compute true positives for the ROC curve i.e. the cumulative sum of positive targets observed along the inverse order of x
fp=[0;cumsum(flipud(c))]-tp; %Compute false positives for the ROC curve i.e. cumulative number of negative targets observed along the inverse order of x
%Compute AUC i.e. the area under the curve (integral) defined by the function: tp=f(fp), normalized within 0:1 limits
a = sum(diff(fp).*(tp(1:end-1,:)+tp(2:end,:)))/(2*s0*s1);
end
a=max(a,1-a); %Since AUC(y,x)=1-AUC(y,-x), we are looking for the maximum AUC achievable out of x predictor

Melden Sie sich an, um zu kommentieren.

Kategorien

Produkte

Version

R2021b

Tags

Gefragt:

am 10 Okt. 2021

Bearbeitet:

am 10 Okt. 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by