Sunday, March 13, 2016

Coefficient of variation

Write a python function that calculates the coefficient of variation for a list of numbers.

This is a trivial case, if we can use the numpy library. However let's do it without first. Even better, I start with a dummy that always returns zero:
def cv(data):
    """
    :param data a list of numbers
    :returns its coefficient of variation, or NaN.
    :rtype float
    """
    return 0.0
Now I can write a few test cases for it:
class CV(unittest.TestCase):
    def test_none(self):
        coll = None
        self.assertTrue(math.isnan(cv(coll)))

    def test_empty(self):
        self.assertTrue(math.isnan(cv([])))

    def test_zero_mean(self):
        coll = [1, 2, 0, -1, -2]
        self.assertTrue(math.isnan(cv(coll)))

    def test_std_var_0(self):
        coll = [42, 42, 42]
        self.assertEqual(cv(coll), 0)

    def test_1(self):
        coll = [0, 0, 6, 6]
        self.assertEqual(cv(coll), 1)

    def test_dot5(self):
        coll = [10, 4, 12, 15, 20, 5]
        self.assertAlmostEqual(cv(coll), 0.503, delta=0.001)
Following the line of the previous post, I have decided that my function should return NaN if the caller passes a None or an empty list in. I remarked this requisite with the first two test cases, test_none and test_empty.
The other test cases should be look quite clear, once we know what the coefficient of variation is. In a few words, it is the standard deviation of a population divided by its mean. This implies that we can't calculate it when the mean is zero. In that case the function should return NaN, as showed by test_zero_mean test case.

Said that, this implementation should look quite straightforward:
def cv(data):
    """
    :param data a list of numbers
    :returns its coefficient of variation, or NaN.
    :rtype float
    """
#1
    if not data:
        return float('NaN')  
#2
    mean = sum(data) / float(len(data))
    if mean == 0:
        return float('NaN')
#3 
    sq_sum = 0.0
    for d in data:
        sq_sum += (d - mean) ** 2
    stddev = math.sqrt(sq_sum / len(data))
#4
    return stddev / mean
1. When the user passes a None or an empty list, NaN is returned.
2. Calculate the mean. If it is zero, NaN is returned.
3. Calculate the standard deviation.
4. Return the coefficient of variation.

As I hinted above, using numpy makes the code so much cleaner:
def cv_(data):
    """
    :param data a list of numbers
    :returns its coefficient of variation, or NaN.
    :rtype float
    """
    if not data:
        return float('NaN')

    mean = numpy.mean(data)
    if mean == 0.0:
        return float('NaN')

    return numpy.std(data) / mean
Full code and test cases are on github.

No comments:

Post a Comment