Today I am happy to announce that choroplethr v3.1.0 is now on CRAN. You can get it by typing the following from an R console:
install.packages("choroplethr")
This version adds better support for summary demographic data for each state and county in the US. The data is in two data.frames and two functions. The data.frames are:
- ?df_state_demographics: eight values for each state.
- ?df_county_demographics: eight values for each county.
These statistics come from the US Census Bureau’s 2013 5-year American Community Survey (ACS). If you would like the same summary statistics from another ACS you can use these two function:
- ?get_state_demograhpics
- ?get_county_demograhpics
For more information on the ACS and choroplethr’s support for it, please see this page.
Relation to Previous Work
In many ways this update is a continuation of work that began with my April 7 guest blog post on the Revolution Analytics blog. In that piece (Exploring San Francisco with choroplethrZip) I explored the demographics of San Francisco ZIP Codes. Because of the interest in that piece, I subsequently released the data as part of the choroplethrZip package. This update simply brings that functionality to the main choroplethr package.
Note that caveats apply to this data. ACS data represent samples, not full counts. I simplify the Census Bureau’s complex framework for dealing with race and ethnicity by dealing with only White not Hispanic, Asian not Hispanic, Black or African American not Hispanic and Hispanic all Races. I chose simplicity over completeness because my goal is to demonstrate technology.
Explore the Data Online
You can explore this data with a web application that I created here. The source code for the app is available here. This app demonstrates some of my favorite ways of exploring demographic data:
- Using a boxplot to explore the distribution of the data
- Exploring the data at both the state and county level
- Using choropleth maps to explore geographic patterns of the data
- Allowing the user to change the number of colors used:
- 1 color uses a continuous scale, which makes outliers easy to see
- Using 2 thru 9 colors puts an equal number of regions in each color. For example, using 2 colors shows values above and below the median
In my opinion, datasets like this really lend themselves to web applications because there are so many ways to visualize the data, and no single way is authoritative.
Selected Images
One of my biggest surprises when exploring this dataset was to discover its strong regional patterns. For example, the regions with the highest percentage White not Hispanic residents tend to be in the north central and north east. The regions with the highest percentage of Black or African American not Hispanic residents is in the south east. And the regions with the highest concentration of Hispanic all Races is in the south west:
Switching to counties shows us the variation within each state. And switching to a continuous scale highlights the outliers.