Today’s post is by Kyle Walker, a professor of geography at Texas Christian University. I’ve been a fan of Kyle’s work for a while. When I saw that he wrote a package for accessing the Census Bureau’s International Data Base, I asked him to write a guest post about it.
The US Census Bureau’s International Data Base (IDB) is one of the best resources on the web for obtaining both historical and future projections of international demographic indicators. I’ve long used the IDB in my teaching, generally using its web interface to download data extracts. However, the Census Bureau also makes the IDB accessible via its API, which makes it much more convenient for programmers to access the data. Earlier this year, I wrote the R package idbr (https://github.com/walkerke/idbr) to help R programmers use the IDB in their projects.
[content_upgrade cu_id=”2669″]Bonus: Download the code from this post![content_upgrade_button]Download[/content_upgrade_button][/content_upgrade]
In this post, I’ll go over the basics of the idbr package and illustrate its functionality with animated GIF visualizations of IDB data using ggplot2 and gganimate.
idbr is available on CRAN, and can be installed with the command install.packages('idbr')
. Once installed, start up your idbr session with the following code:
[code lang=”r”]
library(idbr)
idb_api_key(‘Your API key goes here’)
[/code]
To use the US Census Bureau API, you’ll need to get an API key from http://api.census.gov/data/key_signup.html. It is free and doesn’t take long for you to get the key. Supply the key to the idb_api_key
function to set your API key as an environment variable in your R session; you only need to do this once per session.
There are two main functions in idbr: idb1
and idb5
. idb1
taps into the 1-year-age-band IDB dataset; this dataset includes population counts for single years of age, optionally for specific age ranges or by sex. At its simplest, a user can request data for a given country in a given year. Countries are identified by their FIPS 10-4 codes, which can be looked up with the countrycode package if you don’t know the code for your country of interest.
[code lang=”r”]
library(countrycode)
countrycode(‘Canada’, ‘country.name’, ‘fips104’)
[1] "CA"
[/code]
The country code for Canada is “CA”; this can now be supplied to the idb1
function.
[code lang=”r”]
ca <- idb1(‘CA’, 2016)
head(ca)
Source: local data frame [6 x 6]
AGE AREA_KM2 NAME POP FIPS time
(dbl) (dbl) (chr) (dbl) (chr) (dbl)
1 0 9093507 Canada 361796 CA 2016
2 1 9093507 Canada 361650 CA 2016
3 2 9093507 Canada 361299 CA 2016
4 3 9093507 Canada 360863 CA 2016
5 4 9093507 Canada 360321 CA 2016
6 5 9093507 Canada 360143 CA 2016
[/code]
The function returns a data frame with the default variables available in the 1-year dataset.
More demographic indicators are available via the idb5
function, which includes population data by 5-year age bands as well as measurements of birth, death, and migration rates. Variables are accessible by supplying a variable name or a concept, which refers to a group of variables. To get a list of available variable and concept names, call idb_variables()
or idb_concepts()
.
For example, we can specify the concept “Fertility rates” to get all of the fertility variables for Canada in 2016:
[code lang=”r”]
idb5(‘CA’, 2016, concept = ‘Fertility rates’)
Source: local data frame [1 x 12]
ASFR15_19 ASFR20_24 ASFR25_29 ASFR30_34 ASFR35_39 ASFR40_44 ASFR45_49 GRR SRB TFR FIPS time
(dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (dbl) (chr) (dbl)
1 13.8 53.3 101.9 101.4 41.7 7.1 0.3 0.7767 1.0563 1.5972 CA 2016
[/code]
The idb5
function as called above returns age-specific fertility rates for five-year age bands, as well as the gross reproduction rate, sex ratio at birth, and total fertility rate.
These demographic indicators are even more useful, however, when you can compare them across countries or over time. You can supply a vector of years to the idb1
function to get single-year-of-age population counts over multiple years; idb5
accepts both a vector of country codes and a vector of years. From here, you can design analyses or data visualizations to examine these temporal or cross-country comparisons.
Two animated examples with code are below. The examples use the gganimate extension to ggplot2 which wraps the animation package, so to reproduce the examples, you’ll need ImageMagick installed on your machine and on your system PATH.
Animated population pyramid of Nigeria, 1990-2050 (projected)
[code lang=”r” collapse=”true”]
library(idbr)
library(ggplot2)
library(dplyr)
library(gganimate)
library(animation)
idb_api_key("Your key goes here")
male <- idb1(‘NI’, 1990:2050, sex = ‘male’) %>%
mutate(POP = POP * -1,
SEX = ‘Male’)
female <- idb1(‘NI’, 1990:2050, sex = ‘female’) %>%
mutate(SEX = ‘Female’)
nigeria <- rbind(male, female)
g1 <- ggplot(nigeria, aes(x = AGE, y = POP, fill = SEX, width = 1, frame = time)) +
coord_fixed() +
coord_flip() +
geom_bar(data = subset(nigeria, SEX == "Female"), stat = "identity", position = ‘identity’) +
geom_bar(data = subset(nigeria, SEX == "Male"), stat = "identity", position = ‘identity’) +
scale_y_continuous(breaks = seq(-5000000, 5000000, 2500000),
labels = c(‘5m’, ‘2.5m’, ‘0’, ‘2.5m’, ‘5m’),
limits = c(min(nigeria$POP), max(nigeria$POP))) +
theme_minimal(base_size = 14, base_family = "Tahoma") +
scale_fill_manual(values = c(‘#98df8a’, ‘#2ca02c’)) +
ggtitle(‘Population structure of Nigeria,’) +
ylab(‘Population’) +
xlab(‘Age’) +
theme(legend.position = "bottom", legend.title = element_blank()) +
labs(caption = ‘Chart by @kyle_e_walker | Data source: US Census Bureau IDB via the idbr R package’) +
guides(fill = guide_legend(reverse = TRUE))
gg_animate(g1, interval = 0.1, ani.width = 700, ani.height = 600)
[/code]
Life expectancy at birth by sex in the former USSR, 1989-2016 (click the image for a clearer version)
[code lang=”r” collapse=”true”]
library(idbr)
library(dplyr)
library(ggplot2)
library(tidyr)
library(countrycode)
library(gganimate)
library(tweenr)
ctrys <- countrycode(c(‘Russia’, ‘Ukraine’, ‘Belarus’, ‘Moldova’, ‘Georgia’, ‘Kazakhstan’,
‘Uzbekistan’, ‘Lithuania’, ‘Latvia’, ‘Estonia’, ‘Kyrgyzstan’,
‘Tajikistan’, ‘Turkmenistan’, ‘Armenia’, ‘Azerbaijan’),
‘country.name’, ‘fips104’)
idb_api_key("Your API key here")
full <- idb5(country = ctrys, year = 1989:2016,
variables = c(‘E0_F’, ‘E0_M’), country_name = TRUE)
tmp <- full %>%
filter(time == 1989) %>%
arrange(E0_F)
ord <- as.character(as.vector(tmp$NAME))
dft <- full %>%
mutate(diff = E0_F – E0_M, ease = ‘cubic-in-out’) %>%
select(-FIPS) %>%
rename(Male = E0_M, Female = E0_F) %>%
tween_elements(time = ‘time’, group = ‘NAME’, ease = ‘ease’,
nframes = 500) %>%
gather(Sex, value, Male, Female, -diff, -.group) %>%
mutate(.group = factor(.group, levels = ord))
g <- ggplot() +
geom_point(data = dft, aes(x = value, y = .group, color = Sex, frame = .frame),
size = 14) +
scale_color_manual(values = c(‘darkred’, ‘navy’)) +
geom_text(data = dft, aes(x = value, y = .group, frame = .frame,
label = as.character(round(dft$value, 1))),
color = ‘white’, fontface = ‘bold’) +
geom_text(data = dft, aes(x = 80, y = 1.5, frame = .frame,
label = round(dft$time, 0)), color = ‘black’, size = 12) +
theme_minimal(base_size = 16, base_family = "Tahoma") +
theme(panel.grid.major.x = element_blank(),
panel.grid.minor.x = element_blank()) +
labs(y = ”,
x = ”,
color = ”,
caption = ‘Data source: US Census Bureau IDB via the idbr R package; chart by @kyle_e_walker’,
title = ‘Life expectancy at birth in the former USSR, 1989-2016’)
gg_animate(g, interval = 0.05, ani.width = 750, ani.height = 650, title_frame = FALSE)
[/code]
Please let me know if you have any feedback about the package, and let me know how you are using it! I’m on the web at http://personal.tcu.edu/kylewalker/, and on Twitter at @kyle_e_walker.
[content_upgrade cu_id=”2669″]Bonus: Download the code from this post![content_upgrade_button]Download[/content_upgrade_button][/content_upgrade]