Should mvnrnd Always Advance the State of the Global Stream

Question

Paul am 26 Mai 2020

0
Verknüpfen

Direkter Link zu dieser Frage

https://de.mathworks.com/matlabcentral/answers/534273-should-mvnrnd-always-advance-the-state-of-the-global-stream

Bearbeitet: Paul am 6 Jul. 2023

Consider the following:

>> mu=[1 1]; Sigma=eye(2);

rng('default')

preu1 = rand(1,3);

n1 = mvnrnd(mu,Sigma);

u1 = rand(1,3);

rng('default')

preu2 = rand(1,3);

n2 = mvnrnd(mu,Sigma);

u2 = rand(1,3);

rng('default')

n3 = mvnrnd(mu,0*Sigma);

u3 = rand(1,3);

>> [isequal(u1,u2) isequal(u2,u3) isequal(u3,preu2)]

ans =

1×3 logical array

1 0 1

Apparently mvnrnd doesn't actually call randn if it detects that the input covariance is zero. That may be good for efficiency, but is it the best behavior for repeatability? This behavior seems to be a contradiction to the general direction of how to manage the Global Stream to reproduce results. doc mvnrnd is silent on how it handles this special case.

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Melden Sie sich an, um zu kommentieren.

Melden Sie sich an, um diese Frage zu beantworten.

Answer 1

Steven Lord am 26 Mai 2020

0
Verknüpfen

Direkter Link zu dieser Antwort

https://de.mathworks.com/matlabcentral/answers/534273-should-mvnrnd-always-advance-the-state-of-the-global-stream#answer_439373

There's no requirement that these two lines of code that call the same function with different inputs need follow identical code paths.

myfun(A, B)
myfun(A, C)

The lines of code in your example look similar to the human eye, but I could rewrite them to be less similar but equivalent to what you wrote.

x1 = mvnrnd(mu, eye(2))
x2 = mvnrnd(mu, zeros(2))

9 Kommentare
7 ältere Kommentare anzeigen7 ältere Kommentare ausblenden

Paul am 26 Mai 2020

Bearbeitet: Paul am 26 Mai 2020

I realize there is not a general requirement that a function with two inputs always follow identical code paths. And I realize that doc mvnrnd doesn't specifically guarantee that it will call randn for all inputs. But my question is if it would be better that mvnrnd always does advance the state of the Global Stream in all use cases both for usability and to meet user expectations. From the documentation:

"By controlling the default random number stream and its state, you can control how the RNGs in Statistics and Machine Learning Toolbox software generate random values. For example, to reproduce the same sequence of values from an RNG, you can save and restore the default stream's state, or reset the default stream. For details on managing the default random number stream, see Managing the Global Stream(MATLAB)."

These statements, along with the content in Managing the Global Stream, certainly suggest a reasonable interpretation that mvnrnd always advances the state of the Global Stream regardless of the inputs.

But apparently it does not, which is fine from a computational perspective. And I'm willing to accept for now that it's not a bug. But my question is: should it advance the Global Stream, from a usability perspective?

As it stands, consider an application with a sequence of calls to various RNGs, including mvnrnd, and assume that anything that controls the dimensions of the outputs of the RNGs is fixed. However, the values of the input parameters that are used to call the RNG functions are provided by the user. Would it not be reasonable to expect that executing the sequence twice, and executing rng('default') prior each execution, result in the same outputs for those RNGs that used the same parameters in both runs? As shown in the example, if on the second run the user sets Sigma to 0 for a call to mvnrnd, then all rand* calls downcode from that call to mvnrnd that use the Global Stream will be affected as well. As it stands, it appears that the only mitigation is for the programmer to know of and catch the special cases to advance the Global Stream manually. Is that what we want? Is there another option?

Steven Lord am 26 Mai 2020

Bearbeitet: Steven Lord am 26 Mai 2020

But my question is: should it advance the Global Stream, from a usability perspective?

I would argue no.

Let's play two hands of poker by calling a hypothetical poker() function with the number of players and the name of the game to play. [Inside poker() we start off with a deck of cards shuffled with randperm.]

Our first hand is

poker(2, 'Texas hold em')

Since there are two of us playing (the first input to the poker() function), we need a total of nine cards: my two hole cards, your two hole cards, and the five common cards for the flop, turn, and river. I'm assuming neither of us chooses to fold at any point.

Now a friend stops by and decides to join. I reset the global generator used by randperm and call

poker(3, 'Texas hold em')

That means we deal eleven cards: my two hole cards, your two hole cards, our friend's two hole cards, and the five common cards. [You and I are cheating, since we know all but the final two common cards from our previous hand, but since this is a hypothetical example on MATLAB Answers that's fine.]

Do you expect that after the first call to poker() we should have discarded two cards (advanced the global stream) expecting / predicting that our friend will stop by and decide to play, so the state of the deck is the same after each hand?

If you want to be certain the stream is in a certain state after the mvnrnd call, set it to that state after the call.

Paul am 26 Mai 2020

Bearbeitet: Paul am 27 Mai 2020

Steve,

I've reread your example several times and I'm still not seeing how it's on point. Among other things, adding that third player changed the requirements on the sequence that is generated for your simulated poker game.

Continuing with the poker theme, suppose you're simulating the last five tables at the WSOP before the next cut and so each table plays a hand simultaneously. So you might have a function that includes code that looks like this to simulate what happens.

for n = 1:ntrials
for table = 1:5
    shuffle_and_deal(mu(table),Sigma(:,:,table))
end
end

Assume that the suffle_and_deal function calls mvnrnd with those input arguments. So the result of the simulation depends on the state of the Global Stream at the start of the simulation. Run the simulation once. Reset the Global Stream to whatever its state was at the start of the first run. Now set Sigma(:,:,1) = zeros(2). Rerun the simulation. Would you expect the output of the simulation be any different for tables 2-5? Why should changing only the parameter of the distribution for table 1 have any effect on the other tables? After all, nothing changed for those tables.

If you want to be certain the stream is in a certain state after the mvnrnd call, set it to that state after the call

I'm surprised you said this. It seems counter to the concept of the random number architecture. I'm pretty sure that I've even seen somehwere in the doc that advises to not do things like this, because it shouldn't be needed. It also seems intractable for code that has a sequence of, for example, 1000 calls to mvnrnd. Set the state of the Global Stream after each call? And do so in such a way that you get the same answer for the entire sequence if the state of the Global Stream as the start of the calling squence is reset? I'd rather just a) wrap mvnrnd in my own function and catch the special case, or b) not use mvnrnd at all and just roll my own.

Also, normrnd from the same toolbox has a different behavior when called with standard deviation of zero:

>> s1=RandStream.getGlobalStream.State;x=mvnrnd(1,0);s2=RandStream.getGlobalStream.State;isequal(s1,s2)
ans =
  logical
   1
>> s1=RandStream.getGlobalStream.State;x=normrnd(1,0);s2=RandStream.getGlobalStream.State;isequal(s1,s2)
ans =
  logical
   0

Based on the doc for the toolbox, and the doc for Managing the Global Stream, and how normrnd works, and how the other non-toolbox rng functions like randn and rand work, what would be your expectation as to the state of the Global Stream after a call to mvnrnd with Sigma = 0?

Paul

Paul am 6 Jul. 2023

Bearbeitet: Paul am 6 Jul. 2023

mvnrnd calls randn and then transforms the output of randn to the final output of mvnrnd. So as far as the random number generation is concerned, the parameters of the disribution are the same on each call to mvnrnd.

However, the mvnrnd call to randn will only request the minimum number of standard normal samples needed to compute the output of mvnrnd. In degenerate cases when the covariance is only positive semi-definite, the number of randn samples is less than required for the case when Sigma is positive definite.

The observed behavior results from the implementation of mvnrnd, which could be implemented to always generate the same number of outputs from randn, regardless of the properties of the covariance.

Having said that, you make a good point that there is certainly the possbility that for non-degenerate cases the state of the stream might not be the same if not using the Inversion trasnformation algorithm, but that's not what's happening here.

Steven Lord am 6 Jul. 2023

The observed behavior results from the implementation of mvnrnd, which could be implemented to always generate the same number of outputs from randn, regardless of the properties of the covariance.

Yes, it could. But calling randn isn't free. It's not usually that expensive, but it's not free.

timeit(@() randn(1, 1000))
ans = 9.1405e-06
timeit(@() 1) % About as close to free as you can get in MATLAB
Warning: The measured time for F may be inaccurate because it is running too fast. Try measuring something that takes longer.
ans = 0

If you care about the state of the random number generator before (or after) a particular function call, set it to a known fixed state before or after that call. Don't assume that the implementation will do what you think it should do. What if one of the other functions called as part of mvnrnd's implementation changes in a new release to use more (or fewer) random numbers than it did in the previous release? Should mvnrnd automatically adapt to call randn fewer (or more) times to maintain the state of the random number generator the same after it has finished executing in the new release that it was in after executing in the older release? That may not be possible.

Steven Lord am 6 Jul. 2023

randn isn't free. But it's a price I'd be willing to pay to make it easier to have repeatable Monte Carlo simulations.

Is every user of mvnrnd also willing to pay that price? I'm fairly sure the answer to that question is no.

Scanning back through the three year history of this discussion, I noted this section that I don't think I paid close enough attention to back when it was posted.

As it stands, consider an application with a sequence of calls to various RNGs, including mvnrnd, and assume that anything that controls the dimensions of the outputs of the RNGs is fixed. However, the values of the input parameters that are used to call the RNG functions are provided by the user. Would it not be reasonable to expect that executing the sequence twice, and executing rng('default') prior each execution, result in the same outputs for those RNGs that used the same parameters in both runs? As shown in the example, if on the second run the user sets Sigma to 0 for a call to mvnrnd, then all rand* calls downcode from that call to mvnrnd that use the Global Stream will be affected as well.

Your expectation is reasonable. The key point is that for the same user inputs the code should give the same results. But in your example, Sigma is one of those user inputs! So in the following example I would expect x0 and x1 to be the same, but I would not necessarily expect x2 to be the same as x0 or x1 because they're using different values of Sigma. Since mySimulation calls mvnrnd internally then calls some other random number generator function (let's say randi), the call that generates x0 may get different results internally from randi than the call that generates x2.

Sigma = 0;
rng default
x0 = mySimulation(Sigma);
rng default
x1 = mySimulation(Sigma);
Sigma = 1;
rng default
x2 = mySimulation(Sigma);
isequal(x0, x1)
ans = logical
   1
isequal(x0, x2)
ans = logical
   0

I'm pretty sure that I've even seen somehwere in the doc that advises to not do things like this, because it shouldn't be needed.

I'm pretty sure you're referring to the Note on this documentation page. That's intended to warn against doing something like the following (commented out since calling rng shuffle 1e5 times takes long enough that the MATLAB session MATLAB Answers uses to run code reaches the time limit.) Shuffling the random number generator 1e5 times doesn't necessarily make the numbers "more random" like the comment states the code author believes.

But reseeding the generator once before each time you run your simulation (for reproducibility) as I did above in the x0 / x1 / x2 example is fine.

%{
n = 1e5;
x1 = zeros(1, n);
for k = 1:n
    rng shuffle % To make the numbers "more random"
    x1(k) = rand;
end
%}
function y = mySimulation(Sigma)
% "Burn off" Sigma + 10 numbers then return a random integer in the range [1, 10]
randn(1, Sigma+10);
y = randi(10, 1);
end

Paul am 6 Jul. 2023

Is every user of mvnrnd also willing to pay that price?

Obviously, I can't speak for every user, only for myself. I suppose that Matlab has many features that some users think should be implemented one way and other users think should be implemented another way.

Since mySimulation calls mvnrnd internally ...

mySimulation does not call mvnrnd, it only calls randn and then randi. I assume that x2 is different than x1 because the calls to randn are chewing up a different number of random uniform samples using the default Ziggurat transformation algorithm. If mySimulation is modified to call mvnrnd, then all the results are the same.

Sigma = 0;
rng default
x0 = mySimulation(Sigma);
rng default
x1 = mySimulation(Sigma);
Sigma = 1;
rng default
x2 = mySimulation(Sigma);
isequal(x0, x1)
ans = logical
   1
isequal(x0, x2)
ans = logical
   1

I'll try to add some more context from my perspective. Suppose I have a simulation of three different types of widgets, and the properties of each widget are determined by a random draw based on statistical model of each type of widget. So the first step in my simulation is determine the properties of each widget to realize a specific model for each widget, and the second step is to simulate the peformance of each realized widget. Here, I'm just showing the first step of this process for the first simulation, i.e., the code is not showing the full Monte Carlo simulation loop.

Set the global stream to use inversion, one uniform sample per normal sample.

clear
globalStream = RandStream.getGlobalStream;
globalStream.NormalTransform='Inversion';

Set the mean and covariance for each type of widget.

Sigma = repmat(eye(3),1,1,3).*cat(3,1,2,3);
mu = zeros(1,3);

Step 1 of first simulation, get the properties the realization of each widget

rng(100)
for ii = 1:3
    x1(ii,:) = mvnrnd(mu,Sigma(:,:,ii));
end

Repeat step 1 of first simulation, get the properties the realization of each widget

rng(100)
for ii = 1:3
    x2(ii,:) = mvnrnd(mu,Sigma(:,:,ii));
end

Of course, nothing has changed.

isequal(x1,x2)
ans = logical
   1

Now, change the covariance of the second type of widget. Run step 1 of the first simulation

rng(100)
Sigma(:,:,2) = zeros(3);
for ii = 1:3
    x3(ii,:) = mvnrnd(mu,Sigma(:,:,ii));
end

Unsurprisingly, the realization of the second widget has changed because we've changed the statistical characterization of the second type of widget.

isequal(x2(2,:),x3(2,:))
ans = logical
   0

The realization of the first widget is the same, which seems nice because we used the same master seed (100) and didn't change anything about its statistical characterization

isequal(x2(1,:),x3(1,:))
ans = logical
   1

But the realization for the third widget is different, even though the sim used the same master seed and its statistical characterization also hasn't changed.

isequal(x2(3,:),x3(3,:))
ans = logical
   0

Reasonable people may say that such results are satisfactory. I happen to disagree.

But given that's how mvnrnd works, what's the strategy so that I can use the same master seed (100) and not have the value of Sigma(:,:,2) change x(3,:)

Are you suggesting something like this?

rng(100)
for ii = 1:3
      % set the seed of the gnerator here, but do so idependently
      % of the downstream call to random number generators, perhaps by
      % using a separate stream to get the seeds into rng?
    rng(?)
    x1(ii,:) = mvnrnd(mu,Sigma(:,:,ii));
end

function y = mySimulation(Sigma)
% "Burn off" Sigma + 10 numbers then return a random integer in the range [1, 10]
% randn(1, Sigma+10);
mvnrnd(0,1,1, Sigma+10);
y = randi(10, 1);
end

Melden Sie sich an, um zu kommentieren.

Should mvnrnd Always Advance the State of the Global Stream

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

9 Kommentare
7 ältere Kommentare anzeigen7 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

Should mvnrnd Always Advance the State of the Global Stream

0 Kommentare -2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

Antworten (1)

9 Kommentare 7 ältere Kommentare anzeigen7 ältere Kommentare ausblenden

Siehe auch

Kategorien

Tags

Produkte

Version

Community Treasure Hunt

0 Kommentare
-2 ältere Kommentare anzeigen-2 ältere Kommentare ausblenden

9 Kommentare
7 ältere Kommentare anzeigen7 ältere Kommentare ausblenden