Using Python to Measure Immigration Trends

I recently finished a project that uses Python and the American Community Survey (ACS) to measure immigration in the town I grew up in. This post provides an overview of the results.

If you are interested in doing a similar analysis, I recommend using the code I used for this project as a starting point. The code is in a repo called “hometown_analysis” and you can view it here.

Why Measure Immigration?

On a recent trip to my hometown I was struck by how many of the residents said that the town had changed since I left. When pressed for details they said that immigration was the largest cause of the change.

This surprised me because I recall my hometown as having a significant wave of immigration during my childhood. Many of my friends growing up were born in Iran and came to the US as a result of the Iranian Revolution (1979) and the Iran-Iraq war (1980-88). For reference, I grew up in Great Neck, NY and left when I graduated high school in 1996.

Naively, I had never considered that after leaving there might be another wave of immigration—one that would be unrelated to the one that occurred when I lived there. Since I have some experience with analyzing Census data, I thought it would be fun to chart the rise and fall of these two waves of immigration.

How to Measure Immigration?

My first thought was to find relevant tables in the American Community Survey (ACS) and use the censusdis package to analyze the data. The ACS is run by the Census Bureau and is the largest household survey in the United States. The first 5-year ACS was published in 2009 and the most recent one was published in 2023. You are not supposed to compare overlapping years, so I decided to compare the first year, the last year and the year in the middle (2016).

In terms of geographies, I was surprised to learn that the Census Bureau publishes data at the level of “School Districts”. Growing up my school district was “Great Neck Union Free School District, New York”, which Census assigns ID 12510.

A major challenge when dealing with Census data is figuring out which tables to use. I settled on using three:

  • B05012: Nativity in the United States
  • B05006: Place of Birth for the Foreign-Born Population
  • B02001: Race

Below is an analysis of each of those tables.

Nativity

“Nativity” is the term Census uses to say whether a resident was born in the US or not. I was surprised to see that about a third of the current residents are “Foreign-Born”: that number is higher than I expected, and higher than I recall it being when I was a child. Nonetheless, it has increased only slightly since 2009 (from 30% to 32%). So whatever caused people to say that the town had changed is not reflected in this statistic.If you are interested in seeing the code that generated this graph, please click here.

Place of Birth for the Foreign-Born Population

Census also lets us see which country immigrants were born in. As the following graph shows, this has changed significantly in the last 14 years. I think that this is what residents meant when they told me that the town had changed.

Back in 2009 Iran was, by far, the largest source of immigration to Great Neck. But the number of residents who were born there decreased 25% over the next 14 years.

At the same time, the number of residents from China increased 170%. They ended the period as the largest group of immigrants.

This change in the composition of the Foreign-Born population was probably something that no one expected in 2009. It is remarkable that this change happened while the percentage of the town that is Foreign-Born increased only slightly!

If you’d like to see the code I wrote that analyzed the Place of Birth for the Foreign-Born Population, click here.

Race

Finally, I wondered how the increased immigration from Eastern Asia changed the racial composition of the town over time.

During this period the percentage of the population that was “Asian alone” increased from 11% to 27% (an increase of about 2.5x). The percentage that was “White alone” decreased from 84% to 63% (a drop of 25%). Changes this large in the racial composition are probably easy for residents to notice, and something they had in mind when they said that the neighborhood had changed.

The code I wrote to analyze Race is available here.

What about your Hometown?

One of my goals is to have the hometown_analysis repo serve as a starting point for people who want to do similar analyses. (For example applying the same analysis to their hometown, or looking at slightly different variables). If you wind up doing this, I would love to know. You can contact me here.

Future Work

There are two ways I can imagine building on this project:

  1. Adding in data from before 2009. For example, the 2000 Decennial Census.
  2. Building a web app that lets people apply the same analysis to other geographies. This would be similar to the Covid Demographics Explorer I created last year.
If you would like to be notified about future development then please subscribe to my newsletter. The signup form is on the bottom of this page.

While I have disabled comments on my blog, I welcome hearing from readers. Use this form to contact me.

Ari Lamstein

Ari Lamstein

I’m a software engineer who focuses on data projects.

I most recently worked as a Staff Data Science Engineer at a marketing analytics consultancy. While there I developed internal tools for our data scientists, ran workshops on data science and mentored data scientists on software engineering.

Thanks for visiting!

Sign up to stay up to date with the latest blog posts: