Saturday, October 20, 2012

Variance and Standard Deviation

Why do we need Standard deviation even though we already have Variance?

Average (Arithmetic mean), Variance, Standard Deviation are the three most basic statistics. 

I guess all of my readers are familiar enough with average, variance and standard deviation. This post is more about how to teach these to the students. 

This is the way I taught those concepts when I taught Math. for GMAT preparation. 

There are 10 test scores. And let's say the average is 50 and , the variance is 16. 

Q1: what is Standard Deviation (StDev)? 
Easy, Almost every student can answer this question. 

What if each test scores are doubled?
Average? Sure, still easy. It will be 100. 
Q2 : Variance ? (or StDev?) Emm....
I am not sure how many can answer this question right away.

A story of a principal who doesn't know anything about statistics
There is a school principal who wants his teachers to teach their students in a way that 1) the higher the overall score the better and that 2) the more similar scores to each other students the better. He wants the equality on the test score.       

There are two classes, A and B in his school. The principal asks the total sum of test scores in order to see the overall performance.
$\begin{eqnarray} \sum_i^{N} {x_i}^{} \end{eqnarray}$
Then, the teacher of class A argues that he has small number of student than class B, so the total sum is not fair. The principal agrees. So they decide to apply a ratio of number of students on the total sum so that the result can be the score per student.
$\frac{\sum_i^{N} x_i}{N}$
What is it? Average!

Now, the principal wants to know the equality of scores, so he asks teachers to subtract an average from each number and sum them up.
$\sum_i^{N} {(x_i-\mu)}^{}$
It seems a brilliant idea. if there are more scattered scores, the measure will become larger. Oops, he realized that more students will lead to higher number. So, he decides to divide the measure by the number of students
$\frac{\sum_i^{N} {(x_i-\mu)}^{}}{N} $
 Now, he is satisfied by his brilliant idea.
A problem comes when two teachers report their numbers. They report all zeros.
'O-ho, there will be positive and negative differences from the average and they all are cancelled out!'
'How can I avoid this ? Yes, let me apply square, so that all the differences turn to be positive'
He ask teachers to do so.
$\frac{\sum_i^{N} {(x_i-\mu)}^{2}}{N} $
This is what we call Variance.

The principal is happy with the measures, and seems no need to make another measure.
And, then the score system is changed. The full score is changed from 100 to 200. (It could be because they need to aggregate scores of several subjects or of multiple tests.) All the scores are doubled. Now, everything is doubled even the distance from the average. So the principal expects the variance will be doubled. However, it becomes 4 times. He doesn't like it, because whenever the full score changes their variance changes also but as squared. So, he applies square root on the variance so that whatever the score system changes the measure will change by the same scale.
$\sqrt{\frac{\sum_i^{N} {(x_i-\mu)}^{2}}{N}} $
This is the birth of Standard Deviation.

This story may not draw your interest at all. However, this story must work for the students quite well.

Now, the answer of Q2 above is as easy as the doubled average.
And what if all the scores are tripled? :)

Try this to your students.
They will be even able to memorize the whole formula of standard deviation right after the story telling. (It might be so easy to memorize for you, but not for them.)

I believe that the best way to teach is showing the flow of logic. 


  1. Sorry. It took time to fix the math expression.
    It was ok on Safari, but not on Chrome and Firefox. I didn't try IE.
    However, now it looks fine on Firefox and Chrome also, I guess it is ok on IE also.

  2. This is a very intuitive way to explain the main statistics (mean, variance, stdev). It'd be great if you could continue to post similar examples with intuition for other (more complicated) statistics.