Wednesday, August 12, 2009

Student's t test

This post is a simple demo of using R to carry out Student's t test.

Let's look at a population of values with a normal distribution, mean = 5 and standard deviation = 1.

set.seed(157)
v = rnorm(10000,5,1)


We draw 4 samples without replacement:

e1 = sample(v,4)


> round(e1,2)
[1] 5.34 4.80 5.56 4.96


A t test for one sample tests the null hypothesis that the mean μ for the population from which the sample is drawn is equal to μ0. For example it could be that we have many observations of untreated cells (from which we get μ0), and now we wish to estimate whether the mean values of treated cells are detectably different.

result = t.test(e1,mu=6)


The argument alternative = 'two.sided' is the default, so we don't need to specify it here.

> result

One Sample t-test

data: e1
t = -4.7613, df = 3, p-value = 0.01759
alternative hypothesis: true mean is not equal to 6
95 percent confidence interval:
4.608507 5.723439
sample estimates:
mean of x
5.165973


Even with only four samples and a difference in means of (6-5) / 6 the result of the t test tells us that we can reject the null hypothesis that μ = μ0 = 6, with p=0.018.

Now, it might have been the case that before we saw the data (and that proviso is crucial), we expected from the nature of the treatment that the mean of treated population would be less than the untreated population. In that case, we would be justified in specifying a one-sided test:

result = t.test(e1,mu=6,alternative='less')


We note the p-value is:

p-value = 0.008796


In reality, because of biological variation (as well as unintended variation in experiment conditions) we would always include a control group for such an experiment.

w = rnorm(10000,6,1)
e2 = sample(w,4)

result = t.test(e1,e2,alternative='less')


> result

Welch Two Sample t-test

data: e1 and e2
t = -1.3385, df = 3.416, p-value = 0.1314
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf 0.6194634
sample estimates:
mean of x mean of y
5.165973 6.084764


Now it is much more difficult to see a result with significance. If there 25 samples in the control group we can see a difference:

result = t.test(e1,
rnorm(25,6,1),alternative='less')


> result

Welch Two Sample t-test

data: e1 and rnorm(25, 6, 1)
t = -3.4116, df = 11.547, p-value = 0.002717
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf -0.4127716
sample estimates:
mean of x mean of y
5.165973 6.033374