When graphing data, most of the data we encounter is relatively-closely valued – temperature, people’s weight, SAT scores, Social Security payments. Once in a while, however, you encounter data that has a very wide range – people’s income, and home values. I am graphing buildings in Manchester-by-the-Sea – the date built versus the assessed value.
The “regular” graph uses linear scales for both the horizontal axis (Date Built) and the vertical axis (Assessed Value). In a linear scale the change-in-values are constant- there is the same change in the same distance at each end of the graph, both horizontally (20 years between each tick mark) and vertically ($2M between each tick mark).
For the dates on the graph, this works fine – I can see that the older buildings in town account for a smaller and smaller percentage as the years go by. Unfortunately, because there are a number of high-value buildings in the database (and I have removed three data points [$26m, $20m, and $15m] from this database), those buildings valued at less than $2,000,000 seem to be all bunched together. It is appropriate to “stretch” the Value scale – this is done by making it a logarithmic scale.
In a logarithmic scale the change-in-values are not constant- there are different changes in the same distance at each end of the graph. The rate-of-change is actually Base 10 (or “an order of magnitude”, or “moving the decimal point”). This is seen in the vertical axis – the distance from $100,000 to $1,000,000 is the same as from $1,000,000 to $10,000,000.
Using a logarithmic scale has the effect of “spreading apart” the values, and you can get a much better sense of that clump of houses that were built between 1945 and 1982, and are valued between $300,000 and $900,000
The point of this exercise is not to find “the correct way” of doing something, but to discover new tools that allow you greater insight into your data.
Neat! That's great!!
ReplyDelete