choroplethr v3.1.0: Better Summary Demographic Data

Today I am happy to announce that choroplethr v3.1.0 is now on CRAN. You can get it by typing the following from an R console:

install.packages("choroplethr")

This version adds better support for summary demographic data for each state and county in the US. The data is in two data.frames and two functions. The data.frames are:

  • ?df_state_demographics: eight values for each state.
  • ?df_county_demographics: eight values for each county.

These statistics come from the US Census Bureau’s 2013 5-year American Community Survey (ACS). If you would like the same summary statistics from another ACS you can use these two function:

  • ?get_state_demograhpics
  • ?get_county_demograhpics

For more information on the ACS and choroplethr’s support for it, please see this page.

Relation to Previous Work

In many ways this update is a continuation of work that began with my April 7 guest blog post on the Revolution Analytics blog. In that piece (Exploring San Francisco with choroplethrZip) I explored the demographics of San Francisco ZIP Codes. Because of the interest in that piece, I subsequently released the data as part of the choroplethrZip package. This update simply brings that functionality to the main choroplethr package.

Note that caveats apply to this data. ACS data represent samples, not full counts. I simplify the Census Bureau’s complex framework for dealing with race and ethnicity by dealing with only White not Hispanic, Asian not Hispanic, Black or African American not Hispanic and Hispanic all Races. I chose simplicity over completeness because my goal is to demonstrate technology.

Explore the Data Online

You can explore this data with a web application that I created here. The source code for the app is available here. This app demonstrates some of my favorite ways of exploring demographic data:

  • Using a boxplot to explore the distribution of the data
  • Exploring the data at both the state and county level
  • Using choropleth maps to explore geographic patterns of the data
  • Allowing the user to change the number of colors used:
    • 1 color uses a continuous scale, which makes outliers easy to see
    • Using 2 thru 9 colors puts an equal number of regions in each color. For example, using 2 colors shows values above and below the median

In my opinion, datasets like this really lend themselves to web applications because there are so many ways to visualize the data, and no single way is authoritative.

Selected Images

One of my biggest surprises when exploring this dataset was to discover its strong regional patterns. For example, the regions with the highest percentage White not Hispanic residents tend to be in the north central and north east. The regions with the highest percentage of Black or African American not Hispanic residents is in the south east. And the regions with the highest concentration of Hispanic all Races is in the south west:

state-white

state-black

state-hispanic

Switching to counties shows us the variation within each state. And switching to a continuous scale highlights the outliers.

county-black-continuous

county-hispanic-continuous

Ari Lamstein

Ari Lamstein

I’m a software engineer who focuses on data projects.

I most recently worked as a Staff Data Science Engineer at a marketing analytics consultancy. While there I developed internal tools for our data scientists, ran workshops on data science and mentored data scientists on software engineering.

Thanks for visiting!

Sign up to stay up to date with the latest blog posts: