Physics:PP/11/Statistical Significance and Discovery

From HandWiki

Statistical Significance and Discovery

Experimental measurements often have the disadvantage that they might have been caused by pure statistical accidents. In such cases we need to estimate the level of statistical significance (credibility) of observations, to ensure that a likelihood of such observations not happening by chance is small. One can approach this question from the different direction by estimating the probability that the observed results have occurred due to statistical accident. Then, a large value of this probability would represent a small level of statistical significance (and vice versa). For a good recent review see signif_book.

The probability we are talking about is defined by the number called p-value. Mathematically speaking, it tells about the probability of a result being observed, given that a certain statement (the null hypothesis) is true. The p-value is close to 1 when results observed in measurements could have occurred by chance. Observations are statistically significant when the p-value is small, i.e. when one can demonstrate that the probability of obtaining observations by chance only is relatively low.

Let us give an example. Historical records on sunspot (Sunspots are relatively dark areas on the surface (photosphere) of the Sun.) activity suggest that solar activity is at the level of 80 sunspots per year. The population of sunspots is normally distributed with the standard deviation of [math]\displaystyle{ \sigma=20 }[/math], and the records are large enough to assume the Central Limit Theorem. During one year, however, 130 sunspots were observed. Is this finding significant? Does it indicate a beginning of unusual sun activity?

To estimate the likelihood of the observation of 130 (or more) sunspots in a given year, we will calculate the area under the Normal (Gaussian) probability density function from minus infinity to 130. The Normal distribution should have the mean 80 and [math]\displaystyle{ \sigma=20 }[/math]. Then this value should be subtracted from 1 in order to calculate the probability for events with [math]\displaystyle{ \gt 130 }[/math] sunspots.

from org.apache.commons.math3.distribution import *
no=NormalDistribution(80, 20)
print "p-value=",1-no.cumulativeProbability(130)

The output probability is 0.0062. It tells that the significance of the observation is very high: Only 6 years out of 1000 have occurrences with 130 sunspots due to statistical nature. This value was obtained by using an analytic description of the Normal distribution as implemented in the Apache Common Math library. In reality, the underlying distribution (and asked questions on significance) can be rather complex, therefore, one needs numeric methods to estimate statistical significance.

One can run an experiment where many events are generated using a Normal distribution with the mean 80 and [math]\displaystyle{ \sigma=20 }[/math], and counting a fraction of events with equal or more 130 values. Here is the code that numerically estimates the probability and plots a histogram with the null hypothesis.

Here is the numeric calculation of the p-value:

from cern.jet.random.engine import *
from cern.jet.random import *
from jhplot import *

nr=Normal(80,20,MersenneTwister())
h1=H1D("Normal",100,0,200)
sum=0.; total=100000
for i in range(total):
    va=nr.nextDouble()
    h1.fill(va)
    if (va>=130): sum +=1.0
c1=HPlot()
c1.visible(); c1.setAutoRange()
c1.draw(h1)
print "p value=",sum/total

The code prints the expected p-value of [math]\displaystyle{ \simeq 0.0062 }[/math]. The actual precision will depend on the number of random numbers generated. One should increase the number of random events to reduce statistical uncertainty on this estimate.

Now you know the main trick in the numeric estimation of the p-values. If you need to calculate the p-value numerically, first one should make multiple observations and determine the underlying distribution function, or your "null hypothesis". Then run numeric tests and estimate how many occurrences are expected for a total (large) number of observations, similar to those in the above example. The only difficult part is to find the correct null hypothesis.

You may ask the oppose question: How many events we should see when we require a p-value less than 0.05, which is rather common question in many fields of science (biology, psychology, physics, etc.). We should calculate the value, x, for which the area under the Gaussian probability density function (integrated from minus infinity to x) is equal to the argument:

from org.apache.commons.math3.distribution import *
no=NormalDistribution(80, 20)
print "p=0.05:",no.inverseCumulativeProbability(0.95)

This example returns 112.89.

Now let us consider one special case. If events are rare, i.e. when zero is a very likely number to observe, we should use a Poisson distribution instead of the Normal distribution. In the example with the sunspots, one can ask a question of how many sunspots occur with a certain size or type. For example, if we are talking about rare events with well known mean 0.5, then an observation of 3 events can already be quite significant. Let us calculate the p-value in this situation:

from org.apache.commons.math3.distribution import *
no=PoissonDistribution(0.5)  # mean to 0.5 
print "p-value=",1-no.cumulativeProbability(3) # 3 events

The code returns rather small probability of 0.0018.

In real life, expectations for experimental outcomes have some uncertainties. Let us calculate a statistical significance for an experiment with 7 events, while the expectation is [math]\displaystyle{ 3.1\pm 0.3 }[/math], where 0.3 is uncertainty. In many situations, the uncertainty can be treated in the frequentist approach, i.e. we will assume that 0.3 is the standard deviation of a Gaussian distribution that has the mean value 3.1 (which is often true if the prediction is obtained using different probabilistic techniques). In order to account for uncertainty in the prediction, the mean value of the Poisson distribution should be obtained from a Gaussian distribution with the mean and width of the prediction Highland.

from org.apache.commons.math3.random import *

r=RandomDataGenerator()
observed=7.        # observed number of events
bkg=3.1            # expected value 
err=0.3            # expected error
tot=1000000; sum1=0;
for i in range(tot):
   mean=r.nextGaussian(bkg, err)
   value=r.nextPoisson(mean)
   if (value>=observed): sum1 +=1.0
print "p=", sum1/tot

This code returns 0.041. Note that the above example is not very abstract - it was taken from the real-life situation of the top-quark discovery when the prediction was called "expected background rate", while the number 7 was the total number of observed top-quark candidates pekka.

In the case if you are interested in more conservative estimate of the p-value, and have doubts in the statistical nature of the uncertainty, remove the Gaussian distribution and shift the mean value of the Poisson distribution by the uncertainty, i.e. evaluate the Poisson distribution with the mean 3.4.

Discovery sensitivity

It should be said that, in many ares of science, events in a counting experiments are considered to be statistically significant when the p-value is less than 0.05. This means that an observation has more than 95% chance of being true, i.e. not caused by pure chance. In some scientific fields, such as particle physics, a discovery is announced only when the p-value is less than [math]\displaystyle{ 3\cdot 10^{-7} }[/math] (assuming one-tailed p-value). This value corresponds to the five-sigma ( [math]\displaystyle{ 5\sigma }[/math]) observation, i.e. this probability can be obtained by integrating the tail of the Normal distribution above 5 standard deviations. Putting in other words, the observation would occur by chance alone as rarely as a value sampled from a Gaussian distribution would be five standard deviations from the mean. The Gaussian significance, which will be denoted as Z, is related to the p-value as [math]\displaystyle{ Z=\Phi^{-1}(1-p) }[/math], where [math]\displaystyle{ \Phi^{-1} }[/math] is the inverse of the cumulative distribution.

Let us see how we can covert the p-value to the Z value in the expression [math]\displaystyle{ Z\cdot\sigma }[/math]:

from org.apache.commons.math3.distribution import *
p=NormalDistribution()
print "Z=",-p.inverseCumulativeProbability(3E-07)

You will see a value Z close to 5, i.e. we can say [math]\displaystyle{ 5\sigma }[/math] observation.

If the p-value is smaller than 0.003 (which corresponds to three sigma, [math]\displaystyle{ 3\sigma }[/math]), one can report an evidence for a new phenomena (or a particle) being observed. The reason for such a strong requirement by particle physicists is that new laws of physics should be discovered with much stronger evidence than what is usually required for the p-value of 0.05 which still lives a good room for statistical fluctuations and various instrumental anomalies. In addition, the five-sigma rule is a good protection against problems in estimating the p-value in real life when the background model (or the null hypothesis) is not known well, and measurements can be effected by uncertainties that are hard to foreseen. Such uncertainties do not have statistical nature, and they are usually called systematic uncertainties.

Now let us come back to the previous example and calculate the p-value that corresponds to three sigma ( [math]\displaystyle{ 3\sigma }[/math]) and five sigma ( [math]\displaystyle{ 5\sigma }[/math]). Then we estimate the expected numbers of events we should observe to claim 3 and 5 [math]\displaystyle{ \sigma }[/math] discoveries:

from org.apache.commons.math3.distribution import *
mean=80; sd=20
no=NormalDistribution(mean, sd)
p1=no.cumulativeProbability(mean+3*sd)
p2=no.cumulativeProbability(mean+5*sd)
print "p-value for 3 sigma:",1-p1
print "Events:",int(no.inverseCumulativeProbability(p1))
print "p-value for 5 sigma:",1-p2
print "Events:",int(no.inverseCumulativeProbability(p2))


This prints the answer:

p-value for 3 sigma: 0.00135
Events: 139
p-value for 5 sigma: 2.86651E-07
Events: 180

As you can see, the claim of a new phenomena that causes high sun activity requires the observation of 180 sunspots in a single year assuming the [math]\displaystyle{ 5\sigma }[/math] approach. Observing 140 sunspots in a single year is good enough for an evidence of an unusual sun activity assuming the [math]\displaystyle{ 3\sigma }[/math] criteria. As explained before, simple situations in which the null-hypothesis is well established and systematical variations (i.e those that do not occur by chance) are well understood, do not require the [math]\displaystyle{ 5\sigma }[/math] extreme, i.e. values of p=0.05 and p=0.01 should be usually sufficient.


This section is discussed in more detail in the book "Numeric Computation and Statistical Data Analysis on the Java Platform" chekanovbook by S.Chekanov.


Author: Sergei Chekanov Schekanov (contribution) (talk)


References