Monte Carlo technique for normally distributed random variable

6 Ansichten (letzte 30 Tage)
Daniel Kwak
Daniel Kwak am 28 Nov. 2018
Bearbeitet: Daniel Kwak am 29 Nov. 2018
I am given a task where I have to use a simple Monte Carlo technique to estimate P(A > 5), where A is a variable in a normal distribution.
Using the code below, I am generating numbers in a normal distribution with a mean of 100 and s.d. of 1: (reference: https://www.vertex42.com/ExcelArticles/mc/MatlabMCExample.html)
x = ( randn(n,1) * 1 ) + 100;
So after that code spits out numbers and gets stored in the array x, how would I be able to pick out all the numbers in the array that are +5 s.d. away from the mean, AND count how many of them there are? (so for this case, pick all the numbers in the array that are >= 105).

Antworten (2)

TADA
TADA am 28 Nov. 2018
Bearbeitet: TADA am 28 Nov. 2018
sig = 1; % standard deviation
mu = 100; % mean
n = 10000; % number of values
x = ( randn(n,1) * sig ) + mu;
z = 5; % desired standard score
flags = (x - z*sig) >= mu;
nOut = sum(flags); % this is how you can count them
disp(nOut);
nOut = length(x(flags)); % another way to count them
outValues = x(flags); % this is how you retrieve the values
disp(outValues);
disp(nOut);
It's not going to generate a lot of those outliers though, because >99% of the values are within 3 standard deviations away from the mean value. a standard score of >=5 is A LOT
I only got a couple after increasing n to 10,000,000
you can also use normrnd to generate your population, it's easier because it accepts the mean and standard deviation as input arguments.
  3 Kommentare
John D'Errico
John D'Errico am 28 Nov. 2018
Sometimes hard to know what your instructor is thinking. ;-) But my guess is this is an attempt to teach that Monte Carlo is not always a good thing when the events you care about are truly rare.
Daniel Kwak
Daniel Kwak am 29 Nov. 2018
Bearbeitet: Daniel Kwak am 29 Nov. 2018
Ahh! It all makes sense now because my instructor went over the Importance sampling technique today, which allows you to modify the function so that you can make the sample size 'n' to be way smaller and still achieve similar results, (i.e. shifting the mean from 100 to 500 in this case).
No wonder haha. Ty for your insight!

Melden Sie sich an, um zu kommentieren.


John D'Errico
John D'Errico am 28 Nov. 2018
Bearbeitet: John D'Errico am 28 Nov. 2018
I'd suggest that a better way is to employ a truncated normal distribution. So, if you carefully sample points from a truncated normal, where all events lie above say 4 sigma, then the odds of getting a sample at 5+ sigma or more are reasonably high.
Now you can compute the desired probability (with some careful mathematics). The problem is, my guess is this is against the philosophy of the homework assignment, which is to do it using a pure (untruncated) Gaussian. And since those 5+ sigma events are darn rare, you need a huge sample size to get any points in that bin at all.
format long g
normcdf(5)
ans =
0.999999713348428
2*(1 - normcdf(5))
ans =
5.7330314384707e-07
So tens or better yet, hundreds of millions of samples will be needed for brute force Monte Carlo sampling.
sum(abs(randn(1,1e8)) >= 5)
ans =
54
which matches the expected number of events nicely:
2*(1 - normcdf(5))*1e8
ans =
57.330314384707
So how might a truncated Gaussian sampling work? I'll truncate at +/- 4 sigma.
n = 1e6;
T = 4;
x = norminv(normcdf(-T)*rand(1,n)).*(2*(rand(1,n)<0.5)-1);
count5 = sum(abs(x) >= 5);
P5 = count5/n*2*normcdf(-T)
P5 =
5.76416601362783e-7
Which compares far more favorably with the actual probability of 5.7330314384707e-07, with a far smaller sample size. As I said, it is a bit of mathematics that seems to skirt acceptablity for such a problem. Certainly, if I was your instructor, I might look slightly askew at the approach, except that if you really understood how I did this and what was done here, it would also indicate that you do know what you are doing.
  2 Kommentare
Daniel Kwak
Daniel Kwak am 28 Nov. 2018
Bearbeitet: Daniel Kwak am 28 Nov. 2018
I appreciate your more indepth approach to the problem. I have one question. Towards the end of your code, you are multiplying by 2 to account for the both sides of the bell curve, right? (in count5/n*2*normcdf(-T))
John D'Errico
John D'Errico am 28 Nov. 2018
Yes. normcdf(-T) for example, computes the integral from -inf to -T. But that is only the lower tail. Double it to get the upper tail too, since you care about the case where abs(x)>=5.

Melden Sie sich an, um zu kommentieren.

Produkte

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by