Compbio 018: Getting more significance out of R

While doing research, I have performed statistical tests that have yielded incredibly small p-values. This is not unusual when working with large datasets in genomics. In fact, when using R, I can often get p-values so small that they cannot be accurately represented by the numerical system used. The result is reported as "p-value < 2.2e-16". Here is such an example:

$ a <- 1:10
$ b <- 100:110
$ t.test(a,b)

Welch Two Sample t-test

data: a and b
t = -71.87, df = 18.998, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-102.39768 -96.60232
sample estimates:
mean of x mean of y
5.5 105.0

As you can see, we only know the p-value is smaller than 2.2e-16; the comparison of any two p-values smaller than this are currently impossible. One way to estimate the p-value below 2.2e-16 is by adding "$p.value" to after your statistical test in R:

$ t.test(a,b)$p.value
[1] 1.311457e-24

Here we have a p-value estimated to an exact number, rather than the very broad category of "< 2.2e-16". If we did a different test, between sample a, and another sample, we can see that this is highly significant (p-value < 2.2e-16):

$ c <- 60:70
$ t.test(a,c)$p.value
[1] 2.159892e-20

However, by examining the estimated p-value above, we see that it is a larger p-value than the a vs b test, which is what we would expected from the tested values.

Whether you should do this is a different matter. Forums are filled with complex discussions on the loss of accuracy in calculating numbers < 2.2e-16. But if you "need" to present numbers below 2.2e-16 from statistical tests in R, this is a simple solution to that problem. And if the solution is problematic, I look forward to an active discussion about why and what else we can do.

For more discussion on this topic, and the code I used above, check out this Q&A on stackoverflow:

https://stackoverflow.com/questions/6970705/why-cant-i-get-a-p-value-smaller-than-2-2e-16/6970722

British geneticist interested in splicing, RNA decay, and synthetic biology. This is my blog focusing on my adventures in computational biology.

A geneticist interested in splicing, RNA decay, DNA methylation and synthetic biology. This is my blog focusing on my adventures in computational biology.

British geneticist interested in splicing, RNA decay, and synthetic biology. This is my blog focusing on my adventures in computational biology.

Jan 9 Compbio 018: Getting more significance out of R

Jan 20 Compbio 019: Getting the most out of your data on the command line

Dec 21 Compbio 017: Is your overlap significant?

A geneticist interested in splicing, RNA decay, DNA methylation and synthetic biology. This is my blog focusing on my adventures in computational biology.