Python for Bioinformatics: Student's t test 3

Thursday, August 13, 2009

Student's t test 3

The paired t test is used when two sets of values are related, for example because each of a pair of measurements was made on the same subject.

In this case, it is the mean of the difference between the two values that is distributed according to the t distribution.

This example is from Dalgard.

pre = c(5260,5470,5640,
  6180,6390,6515,6805,
  7515,7515,8230,8770)
post = c(3910,4220,3885,
  5160,5645,4680,5265,
  5975,6790,6900,7335)

plot(pre,post,pch=16,
  col='blue',cex=2)

diff = post-pre

Not only are the values correlated, but the difference is always negative:

> diff
 [1] -1350 -1250 -1755 -1020
 [5]  -745 -1835 -1540 -1540
 [9]  -725 -1330 -1435

t.test(pre,post,paired=T)

> t.test(pre,post,paired=T)

 Paired t-test

data:  pre and post 
t = 11.9414, df = 10,
p-value = 3.059e-07
alternative hypothesis: true difference 
in means is not equal to 0 
95 percent confidence interval:
 1074.072 1566.838 
sample estimates:
mean of the differences 
               1320.455

We can do the test by hand, as follows:

> mean(diff)
[1] -1320.455
> sd(diff)
[1] 366.7455

> x = sd(diff)/sqrt(10)
> x
[1] 115.9751
> abs(mean(diff))/x
[1] 11.38567

The question now is, what fraction of the values from the t-distribution with df = 10 are greater than 11.39?

S = seq(0,1,by=0.001)
w = rt(1000000,df=10)
y = quantile(w,S)
round(tail(y))

> round(tail(y))
 99.5%  99.6%  99.7% 
     3      3      3 
 99.8%  99.9% 100.0% 
     4      4     11

The short answer: not very many!