Saturday, July 25, 2009

Bayes 9: mean with unknown population variance

Before I solve Chuck's problem from last time, I should show the case where we are trying to estimate the population mean but we do not know its variance. In that situation, it makes sense to calculate the sample variance:



and use that as an estimate of the population variance. And, as you probably know if you are reading this, since there is additional uncertainty as to the population variance, when we use these results to estimate the mean we will need to widen the credible interval by using Student's t table instead of the standard normal table.

So, to continue with the previous example, where Arnie had a normal (30,42) prior. Suppose we have only 5 observations: 31.1,28.2,34.2,35,31.5.

We calculate the sample variance:

In R:
v = c(31.1,28.2,34.2,35,31.5)
m = mean(v)
m
# 32
w = (v-m)**2
sum(w)/4)
# 7.335
var(v)
# 7.335


In Python:
L = [31.1,28.2,34.2,35,31.5]
def mean(L): return sum(L)*1.0/len(L)
m = mean(L)
L = [(x-m)**2 for x in L]
(sum(L)/4)
# 7.335


prior precision = 1/42
observation precision = 5/7.335

posterior precision
= 1/42 + 5/7.335
= 0.0625 + 0.6816 = 0.744

posterior variance = 1/precision
= 1/0.744 = 1.344
posterior st dev = sqrt(variance)
= 1.16


The weights are:

prior  0.0625/0.744 = 0.084
observation 0.6817/0.744 = 0.916


The posterior mean is then:

mean = 0.084*30 + 0.916*32 = 31.83


library(Bolstad)
v = c(31.1,28.2,34.2,35,31.5)
normnp(v,30,4,ret=T)


> normnp(v,30,4,ret=T)
Standard deviation of the residuals :2.708
Posterior mean : 31.8320261
Posterior std. deviation : 1.1592201