bootstrap p-value

33 Ansichten (letzte 30 Tage)
Maria K
Maria K am 23 Jun. 2018
Beantwortet: John Williams am 15 Okt. 2018
Hello!
Does anyone know how to get a p-value using bootstrap? I have created a sample of 20 bootstrapped data and calculated a statistic(RMSD) for each of them. Now i want to see if the RMSD from my real data is consistent with being generated from the distribution implied by the null distribution. The problem is I didn't use matlab's bootstrapped functions to get these data so I would like a reply that helps me calculate the p-value with the form my data already have. I have attached the RMSDs from the bootstrapped data as all_RMSD. The RMSD of the original data is 19.9976.
Thanks on advance! ~M.
  1 Kommentar
dpb
dpb am 23 Jun. 2018
This is a statistics Q?? more than Matlab; a couple of links on how to compute the statistic are
bootstrap lecture and bootstrap hypothesis. The first is a general tutorial/lecture on bootstrap in general with example usage and the latter a more specific how to compute from a given numerical example.

Melden Sie sich an, um zu kommentieren.

Akzeptierte Antwort

Jeff Miller
Jeff Miller am 24 Jun. 2018
Let me try to summarize your question (to see if I understand it correctly): You are evaluating some underlying model that predicts a "null distribution" of RMSD values. You want to see whether your observed RMSD value could reasonably be a random sample from that null distribution. If not, you will conclude that the underlying model does not apply to the situation in which you collected your data.
Given that interpretation, this is actually not a problem for bootstrapping. Bootstrapping means constructing new samples from the original data and recomputing the statistic of interest--in your case apparently RMSD--for each of the new samples. You aren't resampling your original data, but rather generating predicted values from some underlying model.
Since you can apparently generate the random values predicted by the underlying model, all you have to do is generate a lot of them and see how your observed value compares to them. If your observed value is greater than 97.5% of the generated values or less than 97.5% of them, then your observed value is out in the 5% tails of the predicted distribution and you would conclude that the underlying model was not right for your situation.
In practice, though, this usually means generating a lot more than 20 predicted RMSD values. Normally I would expect to see the observed value compared with a distribution compiled from hundreds if not thousands of predicted RMSD values.
Maybe you don't need so many in this case because your observed RMSD of 19.9976 is so far out of the range of the predicted values in all_RMSD.mat, but you should get as many as possible.
And you don't need any special MATLAB code to summarize the results. Just make a frequency distribution of the simulated RMSD values and see where the observed value lies relative to that predicted frequency distribution.
  2 Kommentare
Maria K
Maria K am 24 Jun. 2018
Thanks for your elaborate reply. I used 20 samples just to get a crude estimation of the distribution but sure you are right(from what I've seen in the bibliography about 1000 samples are needed to get a good RMSE approximation). I guess my ultimate question is whether the formula I used to get the p-value is correct. I am not sure if I am aloud to refer to external blocks in mathworks but I used the formula suggested here https://blogs.sas.com/content/iml/2011/11/02/how-to-compute-p-values-for-a-bootstrap-distribution.html
Jeff Miller
Jeff Miller am 25 Jun. 2018
Yes, I think that blog post and I are saying the same thing: "How do I count the number of values in a vector that are greater than a given value" is the same as "see where the observed value lies relative to that predicted frequency distribution"

Melden Sie sich an, um zu kommentieren.

Weitere Antworten (1)

John Williams
John Williams am 15 Okt. 2018
Here is some code my advisor wrote back in 2009 that can get the actual P value (rather than simple determine if it is greater than or less than 0.05). Methods are described in the Archives of General Psychiatry paper that follows.
https://www.mathworks.com/matlabcentral/fileexchange/69119-bca_bootstrap
https://www.ncbi.nlm.nih.gov/pubmed/19581566

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by