In a previous post I demonstrated how to use three Python functions I wrote to work with multi-year data from the American Community Survey (ACS): download_multiyear
, graph_multiyear
and pct_change_multiyear
. A limitation of download_multiyear was that it could only download entire tables, not individual variables. I recently encountered a situation where this was problematic. As a result I just added support for downloading individual variables. This post demonstrates both the initial problem and the new solution.
To run the code in this post clone the hometown_analysis repo.
Starting Point: A Single Year
The table we will be working with is B05006: Place of Birth for the Foreign-born Population in the United States. Here is how to download the entire table from 2019:
from censusdis.datasets import ACS5 from censusdis.states import NY from utils import download_multiyear df = download_multiyear( dataset=ACS5, vintages=[2019], group="B05006", state=NY, school_district_unified="12510", ) df.T.sort_values(by=0, ascending=False).head(11)
The output is:
0 Total 14289 Asia 11013 Eastern Asia 5047 South Central Asia 4701 China 3768 Iran 3723 China, excluding Hong Kong and Taiwan 2726 Year 2019 Europe 1821 Americas 1270 Korea 1179
The dataframe has 169 columns. Most of the top columns contain aggregate data (like “Total” or “Asia”). There are only 3 countries in the top 11 rows (China, Iran and Korea). And the population in those countries goes down very quickly.
The Problem: Changing Number of Variables
Since I am interested in trends over time, I thought to get the same table for additional years. However, this does not work:
df = download_multiyear( dataset=ACS5, vintages=[2009, 2014, 2019], group="B05006", state=NY, school_district_unified="12510", prompt=False, ) print(df)
The code generates a lot of warnings, and then raises this Exception:
CensusApiException: Census API request to https://api.census.gov/data/2009/acs/acs5/variables/B05006_162E.json
The issue is that table B05006
has had a different number of variables over the years (7 variables were added in 2019). While I could update the code to handle this case, it’s not obvious to me that one should be doing things like this in the first place. A better approach is to find the particular variables you care about and select them individually.
Solution: download_variables
The particular variables I was interested in are for China, Iran and Korea. By visiting this page you can see that they are B05006_049E
, B05006_060E
and B05006_054E
. We can access them over multiple years by feeding them to the new parameter download_variables
:
df = download_multiyear( dataset=ACS5, vintages=[2009, 2014, 2019], download_variables=["B05006_049E", "B05006_060E", "B05006_054E"], state=NY, school_district_unified="12510", ) print(df)
The output is:
... China Iran Korea Year 0 1543 4424 857 2009 0 2593 5168 918 2014 0 3768 3723 1179 2019
As a reminder, if any of these variables had been used for a different label over time, then download_variables would have emitted a warning.
Graphing the Results
As a reminder, we can graph the results of download_multiyear
with graph_multiyear
:
from utils import graph_multiyear graph_multiyear( df, "Place of Birth for the Foreign-Born Population<br>Great Neck Unified School District", "Population", )
Percent Change
We can convert the results of download_multiyear
to show percent change with the function pct_change_multiyear
:
from utils import pct_change_multiyear df_percent_change = pct_change_multiyear(df) graph_multiyear( df_percent_change, "Place of Birth for the Foreign-Born Population<br>Great Neck Unified School District", "Percent Change", )
The output is:
China Iran Korea Year 0 NaN NaN NaN 2009 0 68.0 16.8 7.1 2014 0 45.3 -28.0 28.4 2019
Conclusion
These functions were created to help me explore (in Python) how the demographics of my hometown have changed over the course of my life. To learn more about this project, visit the hometown_analysis repository on github. There you can find background on the project as well as analyses using these functions.
In addition to the analysis being interesting in itself, my hope is that the repository will inspire others to undertake similar projects. I often describe Census data as a national treasure that is underappreciated by the data science community. The code is released under the MIT license, which means that you can freely use it as a starting off point for your own projects.