As part of my hometown_analysis project I’ve written some new functions for working with multi-year data from the American Community Survey (ACS). The functions are download_multiyear, graph_multiyear and pct_change_multiyear. This code is open source and released under the MIT License. It currently lives in utils.py in the hometown_analysis repo. While I don’t think the functions warrant a package on PyPI, I do think that they can help others doing similar projects. This post demonstrates how to use them.
The central problem these functions address is that the ACS is not designed as a time-series. So every time you analyze a table over multiple years you need to write certain boilerplate code. These functions handle a lot of those tasks for you. The intention is that these functions speed up exploratory data analysis.
Downloading Multiple Years of Data
ced.download in the censusdis package to download data on a single year:
import censusdis.data as ced
from censusdis.datasets import ACS5
from censusdis.states import NY
df = ced.download(
dataset=ACS5,
vintage=2009,
group="B05012",
state=NY,
school_district_unified="12510",
)
print(df)
STATE SCHOOL_DISTRICT_UNIFIED B05012_001E B05012_002E B05012_003E \
0 36 12510 44953 31623 13330
GEO_ID NAME
0 9700000US3612510 Great Neck Union Free School District, New York
- The
vintageparameter to ced.download takes a single year. - The data we get back has “variables” for column names (ex. “B05012_001E”). We need to do some work to convert them to “Labels” (such as “Total”).
- The dataset has some columns (such as STATE) which feel a bit redundant given that we know the state we requested data about.
The API for download_multiyear is meant to mirror that of ced.download. The primary difference is that the new vintages parameter takes a list of years:
from utils import download_multiyear
df = download_multiyear(
dataset=ACS5,
vintages=[2009, 2014, 2019],
group="B05012",
state=NY,
school_district_unified="12510"
)
df
... Total Native Foreign-Born Year 0 44953 31623 13330 2009 0 45249 30096 15153 2014 0 45044 30755 14289 2019
Year column. download_mulityear has 3 additional parameters which have default values:rename_vars=True. If True then rename the columns from variables to labels. The labels from the last year are used. Only the last portion of the label (!!is a separator) is used and any trailing:is dropped.drop_cols=True. If True then drops columns which do not contain survey data. This tends to be geographic metadata.-
prompt=True. download_multiyear emits a warning if a variable’s label changed during the selected years. If prompt is True then users are also prompted to confirm that they want to continue with the download despite the label mismatch. In order to reduce false positives:is removed when doing the comparison (e.g. “Total:” and “Total” are considered identical).
Graphing Multiple Years of ACS Data
Putting the data in such a simple form makes it easy to write a function to graph it. That’s what graph_multiyear does:
from utils import graph_multiyear
graph_multiyear(
df=df,
title="Population by Nativity in Great Neck School District",
yaxis_title="Population"
)
These graphs will render interactively on your local machine. However, I could not figure out how to make them interactive in WordPress.
y_cols allows you to render only a subset of the columns:
graph_multiyear(
df=df,
title="Population by Nativity in Great Neck School District",
yaxis_title="Population",
y_cols=["Native", "Foreign-Born"]
)
Graphing Percent Change
While Pandas has a function pct_change, it is difficult to use on our dataset because it works on all columns (including the “Year” column). Since I anticipate doing this operation multiple times in this analysis (including boilerplate code like rounding the result), I wrote the function pct_change_multiyear:
from utils import pct_change_multiyear
df = pct_change_multiyear(df)
print(df)
graph_multiyear(
df=df,
title="Percent Change in Population by Nativity in Great Neck School District",
yaxis_title="Percent Change"
)
Total Native Foreign-Born Year 0 NaN NaN NaN 2009 0 0.7 -4.8 13.7 2014 0 -0.5 2.2 -5.7 2019![]()
Conclusion
The functions download_multiyear, graph_multiyear and pct_change_multiyear have sped up my ability to do exploratory data analysis of ACS data that involves multiple years of the same table. The code is open source and available for others to use as well. Please contact me if you have any questions.


