Tag Archives: Standard Deviation

Statistics and C# (Part 3)

In previous posts, I’ve been developing a rudimentary statistical library in C#. The library, NumSkull, currently supports various descriptive statistics.  This post adds the variance and standard deviation estimators to the library.

Variance

The variance of a sample of measurements is a fairly complicated idea that is usually presented as a formula in basic statistics courses without much explanation.  Unfortunately, I intend to do the same thing for this entry, but at some point I’d like to blog about estimators and explore population and sample variance in greater depth. For this post, I present sample variance mathematically and then leverage it to provide the standard deviation.

\(s^2 = \frac{1}{n-1} \sum_{i=1}^n (y_i – \bar y)^2\)
 
The LINQ for this is fairly boilerplate and uses the mean function previously developed, with some help from the System.Math library:

public static double Variance(this IEnumerable<double> data) { if (data.IsNullOrEmpty()) return default(double); var mean = data.Mean(); var values = data.Select(v => Math.Pow(v – mean, 2)); var variance = Math.Pow((1d / (data.Count() – 1) * (values.Sum())), 2); return variance; }

Standard Deviation

Standard deviation, for a population, is the positive square root of the variance, and is used to provide a fairly accurate picture of variation for a single set of measurements. Like variance, standard deviation is an estimator.

\(s = \sqrt (s^2)\)
 
The LINQ in this code is constrained to the variance method, so the code becomes a wrapper around the Math.Sqrt function.

public static double StandardDeviation(this IEnumerable<double> data)
{
    var variance = data.Variance();
    var standardDeviation = Math.Sqrt(variance);
    return standardDeviation;
}