Tuesday, April 17, 2012

4/17/2012 When to use a logarithmic scale

When graphing data, most of the data we encounter is relatively-closely valued – temperature, people’s weight, SAT scores, Social Security payments. Once in a while, however, you encounter data that has a very wide range – people’s income, and home values. I am graphing buildings in Manchester-by-the-Sea – the date built versus the assessed value.

The “regular” graph uses linear scales for both the horizontal axis (Date Built) and the vertical axis (Assessed Value). In a linear scale the change-in-values are constant- there is the same change in the same distance at each end of the graph, both horizontally (20 years between each tick mark) and vertically ($2M between each tick mark).


For the dates on the graph, this works fine – I can see that the older buildings in town account for a smaller and smaller percentage as the years go by. Unfortunately, because there are a number of high-value buildings in the database (and I have removed three data points [$26m, $20m, and $15m] from this database), those buildings valued at less than $2,000,000 seem to be all bunched together. It is appropriate to “stretch” the Value scale – this is done by making it a logarithmic scale.

In a logarithmic scale the change-in-values are not constant- there are different changes in the same distance at each end of the graph. The rate-of-change is actually Base 10 (or “an order of magnitude”, or “moving the decimal point”). This is seen in the vertical axis – the distance from $100,000 to $1,000,000 is the same as from $1,000,000 to $10,000,000.

Using a logarithmic scale has the effect of “spreading apart” the values, and you can get a much better sense of that clump of houses that were built between 1945 and 1982, and are valued between $300,000 and $900,000

The point of this exercise is not to find “the correct way” of doing something, but to discover new tools that allow you greater insight into your data.

1 comment: