Tag Archives: Statistics

Statistics and C# (Part 1)

I’m currently attending a night class covering statistics for mathematicians at Eastern Michigan University. This class, MATH 370, covers basic concepts of probability; expectation, variance, covariance distribution functions and their application to statistical tests of hypothesis; bivariate, marginal and conditional distributions; treatment of experimental data. More information can be found at Eastern’s website http://www.emich.edu.

As a method for increasing my personal knowledge of programming with statistical functions and floating point representations, I thought it would be worthwhile to try and develop a library that mirrors the algorithms and functions that I am learning about in the class.

I am developing this library using C# 4.0 in VS2010 using TDD as my development methodology. For functions that consume sets of data, I will implement them as extension methods that operate solely on the System.Double type, which is a 64-bit implementation of the IEEE-754 standard for representation of floating point numbers (http://en.wikipedia.org/wiki/Double_precision_floating-point_format).

The class started off by discussing the need for numerically descriptive measures of a set of data. These measures can be broken down into two categories.  The first category are algorithms that measure the central tendency of a set of numbers, and the second are algorithms that measure dispersion or variation.

The most common measure of central tendency is the arithmetic mean, commonly known as the (plain vanilla) mean or average.

Mean

The mathematical representation of mean is this:

\(\bar y = \frac{1}{n} \sum_{i=1}^n y_i\)

 

There’s a statistical point to be made here. The formula is used for the sample mean, and not the population mean. In plain English, the difference is understanding that the sample of a set of data is much smaller than the entire set, but the measurement is only a useful measurement of the sample.  You cannot reliably make an inference from the sample mean and apply it to the entire set (population).

LINQ does provide an extension method for the mean, called Average, but for illustrative purposes, I chose to re-implement it using the LINQ Sum() and count() extension methods.

public static double Mean(this IEnumerable data)
{
    if (data.IsNullOrEmpty()) return default(double);

    var mean = (1d / data.Count()) * data.Sum();
    return mean;
}

 

The code for this project can be found on codeplex. NumSkull