How to best determine the probability of a distribution given an outlying observation?
Ältere Kommentare anzeigen
Hi,
I have a classification problem. I have a set of data from a reference process (let's call that "known") and a set of data from a second process (let's call that "test").
Hypothesis 0 is that the test sample came from an identical process as the "known", and will therefore have the same distribution.
Hypothesis 1 is that the test sample came from a different process. However, here is the catch: for all but one sample, this process has an identical distribution to the "known". Just one sample will be "suspiciously" low.
I will add a picture to better explain:

In this case, the red histogram is the reference "known" distribution. The blue histogram is the questioned "test" distribution. In this case, I already know that the test came from a different process. It might not be completely clear due to the overlaying, but it can be seen that the distributions pretty well match, except for a single blue sample which is suspiciously low.
What I need now is to take each distribution and work out some method of returning a probability that the extremely low blue value would be observed given the distribution is the "known" distribution. I know how to calculate the probability of a particular single observation, but how do I properly balance this with the number of observations? Would just a KS test be appropriate? It strikes me as stats 101, but it's been a while, and I don't want to get this wrong.
Thanks in advance.
Akzeptierte Antwort
Weitere Antworten (1)
per isakson
am 12 Sep. 2012
0 Stimmen
Kategorien
Mehr zu Descriptive Statistics finden Sie in Hilfe-Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!