Implement the "total variation distance" (TVD) in Matlab

I am trying to implement the Total variation distance of probability measures (TVD) in Matlab.
Would it be correct to use the max function, in order to calculate the "supremum" of the TVD equation (here below)?
My attempt:
% Input
A =[ 0.444643925792938 0.258402203856749
0.224416517055655 0.309641873278237
0.0730101735487732 0.148209366391185
0.0825852782764812 0.0848484848484849
0.0867743865948534 0.0727272727272727
0.0550568521843208 0.0440771349862259
0.00718132854578097 0.0121212121212121
0.00418910831837223 0.0336088154269972
0.00478755236385398 0.0269972451790634
0.00359066427289048 0.00110192837465565
0.00538599640933573 0.00220385674931129
0.000598444045481747 0
0.00299222022740874 0.00165289256198347
0 0
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0 0.000550964187327824
0.000598444045481747 0
0.000598444045481747 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0.00119688809096349 0.000550964187327824];
P = A(:,1);
Q = A(:,2);
% Total variation distance (of probability measures)
d = max(abs(P-Q))
d = 0.1862

 Akzeptierte Antwort

Bruno Luong
Bruno Luong am 4 Aug. 2023
Bearbeitet: Bruno Luong am 4 Aug. 2023
Supremum is very often implemented by max, since one can only list or compute a finite set on computer.
However your formula d = max(abs(P-Q)) is not correct to compute TVD.
According to this wiki page; correct formula is given bellow "When Ω is countable"
d = 0.5 * norm(P-Q,1)
or
d = 0.5 * sum(abs(P-Q));

8 Kommentare

To illustrate wiki identity with tiny example:
% Generate random test discrete probability density (pdf) P and Q
n = 5;
P=rand(1,n); P=P/sum(P)
P = 1×5
0.0892 0.3597 0.0123 0.2731 0.2658
Q=rand(1,n); Q=Q/sum(Q)
Q = 1×5
0.1249 0.0693 0.2754 0.2642 0.2663
% Compute TVD using definition
n = length(P);
b = logical(dec2bin(0:2^n-1)-'0');
d = zeros(1,size(b,1));
for k=1:size(b,1)
bk = b(k,:);
Pa = sum(P(bk));
Qa = sum(Q(bk));
dPaQa = abs(Pa-Qa);
d(k)= dPaQa;
end
dPQ = max(d)
dPQ = 0.2993
dFormula = 0.5 * norm(P-Q,1)
dFormula = 0.2993
I must admit this formula is a little bit a small miracle to me. bk where d reaches its max (sup) probably splits P and Q in two parts such that P(bk)>Q(bk) and P(~bk)<Q(~bk) or something like that.
I'm too lazy to make a proof.
Sim
Sim am 4 Aug. 2023
Bearbeitet: Sim am 4 Aug. 2023
Hi @Bruno Luong! Thanks a lot for your contributions to this question :-)
Did I understand correctly what you did, i.e. the following ? (please see if the wikipedia formulas I report here correspond to what you did with Matlab..)
(1) First equation for TVD (from Wikipedia's Definition)
% Compute TVD using definition
n = length(P);
b=logical(dec2bin(0:2^n-1)-'0');
d=0;
d = zeros(1,size(b,1));
for k=1:size(b,1)
bk = b(k,:);
Pa = sum(P(bk));
Qa = sum(Q(bk));
dPaQa = abs(Pa-Qa);
d(k)= dPaQa;
end
dPQ = max(d)
About this First equation for TVD (from the definition): I do not understand why you summed P and Q:
Pa = sum(P(bk)); % ?
Qa = sum(Q(bk)); % ?
Could you please explain it to me?
(2) Second equation for TVD (from Wikipedia's Properties)
dFormula = 0.5 * norm(P-Q,1)
Bruno Luong
Bruno Luong am 4 Aug. 2023
Bearbeitet: Bruno Luong am 4 Aug. 2023
"I do not understand why you summed P and Q:"
bk is the indicatrice of the set A (not your A, the A of the measure F in the definition). In other word A = find(bk); therefore P(A) (the probabilty in the wiki definition) is then sum(P(find(bk))) = sum(P(bk)), P used in MATLAB code is the pdf, we need to sum pdf to get the probability.
Similar for Q.
Second question: yes you undersand correctly
Sim
Sim am 4 Aug. 2023
Bearbeitet: Sim am 4 Aug. 2023
OK... so, basically, what I wrote initially i.e.
d = max(abs(P-Q))
was not fully correct, right?
I tried to compare all your code to what I wrote initially, and there is a small difference between what you did and what I wrote initially:
% Generate random test discrete probability density (pdf) P and Q
n = 5;
P=rand(1,n); P=P/sum(P);
Q=rand(1,n); Q=Q/sum(Q);
% Compute TVD using definition
n = length(P);
b = logical(dec2bin(0:2^n-1)-'0');
d = zeros(1,size(b,1));
for k=1:size(b,1)
bk = b(k,:);
Pa = sum(P(bk));
Qa = sum(Q(bk));
dPaQa = abs(Pa-Qa);
d(k)= dPaQa;
end
dPQ = max(d) % <-- (1) First equation for TVD (from Wikipedia's Definition)
dPQ = 0.4406
dFormula = 0.5 * norm(P-Q,1) % <-- (2) Second equation for TVD (from Wikiperida's Properties)
dFormula = 0.4406
d_Sim = max(abs(P-Q)) % <-- what I wrote initially
d_Sim = 0.2920
Final message to future readers: What I wrote initially is not correct. Please use the @Bruno Luong's code! :-)
Note that
d_Sim = max(abs(P-Q))
is
norm(P-Q, Inf)
Warning! If I use n > 30 the calculation stops for lack of memory... Any workaround?
Bruno Luong
Bruno Luong am 4 Aug. 2023
Bearbeitet: Bruno Luong am 4 Aug. 2023
Don't use the brute force implementation of the initial definition for any discrete pdf with more than 20 values (n = cardinal of Omega), rather use
dFormula = 0.5 * norm(P-Q,1)
The for-loop I made is just to illustrate the correctness of the formula. Just like no-one would computes the determinant of matrix 30 x 30 using Leibniz formula.
Ah ok..great..!! Many many thanks!
Then, I will use:
dFormula = 0.5 * norm(P-Q,1)

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

Debadipto
Debadipto am 4 Aug. 2023

1 Stimme

Hi Sim,
Upon searching, I found the exact question being asked on stackoverflow (I'm assuming it was posted by you only), where somebody has already answered the question. I am attaching the link to that answer for future reference:

Kategorien

Gefragt:

Sim
am 3 Jul. 2023

Bearbeitet:

am 4 Aug. 2023

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by