Saturday, April 28, 2012

4/28/2012 Empire of the Summer Moon

For Christmas 2011 my brother-in-law gave me a copy of Empire of the Summer Moon by S.C. Gwynne. It tells the story of the Comanches in the 1800’s. Centered in the southern Great Plains, from Kansas to northern Mexico, they were the most violent of the American Indian tribes. All of the stories and vignettes in this book relate to thoughts, actions, and words of actual individuals – out of these various threads, the fabric of our nation today is woven. There is only one map in the book - a very nice one in the beginning, by Jeffrey L. Ward.
The best way to read this book is in front of a computer, with Google Maps, Google Earth, and a search engine up and running. Empire … is the story about a particular place and time in our Nation’s history, and the geography is an integral part of the story – visualization of that geography helps to complete the story.

Chapter Nineteen – The Red River War

Page 275 … But the best reason to camp in the panhandle was that, in all the southern plains, there was no better place to hide. In the general vicinity of present-day Amarillo, the dead-flat Llano Estacado gave way to the rocky buttes and muscular upheavals of the caprock, where the elevation fell as much as a thousand feet. Into this giant escarpment the four major forks of the Red River had cut deep, tortuous canyons, creating some of the most dramatic landscapes in the American West. The spectacular Palo Duro Canyon, carved out over the geological aeons by the Prairie Dog Town Fork of the Red River, was a thousand feet deep, one hundred twenty miles long, between a half-mile and twenty miles wide, and crossed by innumerable breaks, washes, arroyos, and side canyons. This was long the Quahadis’ sanctuary. …
… The final campaign took the form of five mounted columns designed to converge on the rivers and streams east of the caprock. Mackenzie commanded three of them: his own crack Fourth Cavalry was to march from Fort Concho (present-day San Angelo), and probe northward from his old supply camp on the Fresh Water Fork of the Brazos; Black Jack Davidson’s Tenth Cavalry would move due west from Fort Sill; and George Buell’s Eleventh Infantry would operate in a northwesterly direction between the two. From Fort Bascom in New Mexico, Major William Price would march east with the Eighth Cavalry, while Colonel Nelson A. Miles, a Mackenzie rival and a man destined to become one of the country’s most famous Indian fighters, came south with the Sixth Cavalry and Fifth Infantry from Fort Dodge, Kansas.
I have two monitors, and I like working with the additional real estate (as opposed to an iPad or Kindle). I don’t think that I want to see a movie – but these additional interactive visualization tools make “reading” a richer experience.

Wednesday, April 18, 2012

4/18/2012 English Counts

With all the talk about data, and statistical ranges, and visualizations, and projections, let us never forget that yes, English counts. This map appeared in USA Today on April 13,2012:


It is a wonderful map. The projection allows for minimal distortion of “the lower 48”. The color range is evocative of dryness and drought. The five colors (plus white) are easily discernible. I don’t know the underlying data distribution, but the map agrees with my prejudices about the distribution of usual weather conditions in the US in the springtime.

My only complaint are with the words used to describe the data ranges. “No drought” is good for the first range, and “Abnormally dry” and “Moderate” are fine for the next two, although I don’t really have much of a sense as to whether “Moderate” is drier than “Abnormally dry”, or not. I must take issue with naming the three driest ranges: “Severe”, “Extreme”, and “Exceptional”. I have no idea (without being told in this graphic) if “Exceptional” is drier than “Severe”, or that “Extreme” is not as bad as “Exceptional”. English counts, and the words that you use to describe your data must communicate the relative level of the variable that you are measuring. Five phrases might be “Best”, “Good”, “Middle”, Bad”, and “Worst”. Another effective method would be to include numbers in the descriptions: “5-year Drought”, “10-year Drought”, “20-year Drought”, “50-year Drought”, and “100-year Drought”. There are many phrases you can use to describe data, but please be careful – I find it a bit distressing if you want me to figure out that “Exceptional” is worse than “Extreme”.

Tuesday, April 17, 2012

4/17/2012 When to use a logarithmic scale

When graphing data, most of the data we encounter is relatively-closely valued – temperature, people’s weight, SAT scores, Social Security payments. Once in a while, however, you encounter data that has a very wide range – people’s income, and home values. I am graphing buildings in Manchester-by-the-Sea – the date built versus the assessed value.

The “regular” graph uses linear scales for both the horizontal axis (Date Built) and the vertical axis (Assessed Value). In a linear scale the change-in-values are constant- there is the same change in the same distance at each end of the graph, both horizontally (20 years between each tick mark) and vertically ($2M between each tick mark).


For the dates on the graph, this works fine – I can see that the older buildings in town account for a smaller and smaller percentage as the years go by. Unfortunately, because there are a number of high-value buildings in the database (and I have removed three data points [$26m, $20m, and $15m] from this database), those buildings valued at less than $2,000,000 seem to be all bunched together. It is appropriate to “stretch” the Value scale – this is done by making it a logarithmic scale.

In a logarithmic scale the change-in-values are not constant- there are different changes in the same distance at each end of the graph. The rate-of-change is actually Base 10 (or “an order of magnitude”, or “moving the decimal point”). This is seen in the vertical axis – the distance from $100,000 to $1,000,000 is the same as from $1,000,000 to $10,000,000.

Using a logarithmic scale has the effect of “spreading apart” the values, and you can get a much better sense of that clump of houses that were built between 1945 and 1982, and are valued between $300,000 and $900,000

The point of this exercise is not to find “the correct way” of doing something, but to discover new tools that allow you greater insight into your data.