Sunday, July 26, 2009

A bit more about variance

As I alluded to, there is a nice discussion of why variances add here by a statistics teacher named Dave Bock. I think his discussion is terrific. He gives a number of important applications of the result (which, if they really are AP course-level material, is just amazing).

But, as that article makes clear, this is only true if the two variables are independent. And he gives a great example of this.

Consider a survey in which we ask people two questions: During the last 24 hours, how many hours were you asleep? And how many hours were you awake?

There will be some mean number of sleeping hours for the group, with some standard deviation. There will also be a mean and standard deviation of waking hours. But now let's sum the two answers for each person. What's the standard deviation of this sum? It's 0, because that sum is 24 hours for everyone -- a constant. Clearly variances did not add here.


SD2(X +/- Y) = SD2(X) + SD2(Y)


Just as the Pythagorean theorem applies only to right triangles, this relationship applies only to independent random variables.
The name helps kids remember both the relationship and the restriction.

As you may suspect, this analogy is more than a mere coincidence. There's a nice geometric model that represents random variables as vectors whose lengths correspond to their standard deviations. When the variables are independent, the vectors are orthogonal, and then the standard deviation of the sum or difference of the variables is just the hypotenuse of a right triangle.

He uses the fact that variances add to obtain what he calls a part of the Central Limit Theorem, an expression for the variance of the mean x.

(1) Var(x) = Var(x1 + x2 +...+ xn / n)
(2) = 1/n2 Var(x1 + x2 +... +xn)
(3) = 1/n2 [Var(x1) + Var(x2) +... + Var(xn))
(4) = 1/n2 * n σ2
(5) = σ2 / n


In going from line 1 to line 2, we used the fact we discovered last time, that when we divide by a constant, the variance is divided by the square of that constant. In going from line 2 to line 3, we used the result about addition. And in going from line 3 to line 4, we used the fact that Var(x1) = Var(x2), ...= Var(xn). Hence:

  SD(x) = √(σ2/n) = σ/√n