New Release: Covid Demographics Explorer v2

Update 6/9/25: I have recorded a video walkthrough of this post. You can watch it here.

I recently published a new version of my Covid Demographics Explorer app. I encourage you to try it out!

This version adds data from the 2023 American Community Survey (ACS) 1-year estimates, and has several other improvements.

Updated Visualizations

People who saw my talk about this project know what led me to start it: during Covid the apartment building I live in essentially emptied out. So many people left the building that I felt self-conscious that I had stayed. I wanted to compare my experience during Covid to official statistics.

The first graph in the app still defaults to a time series of San Francisco’s population. Although now there is a dashed line to indicate that data from 2020 is missing. I also use color to highlight which years are pre- and post-Covid:

The official statistics generally match my recollection: SF had a large population drop during Covid. What surprises me is that the population hasn’t returned to pre-Covid levels (to me, San Francisco feels as busy as it did pre-Covid).

After seeing this graph people often make two comments:

  • This graph makes SF’s post-Covid population drop seem dramatic. Perhaps if you expressed the change on a percentage basis it would seem smaller. 
  • Lots of counties experienced a population drop during Covid. Can you compare what happened in SF to what happened elsewhere?

I wanted to create a single visualization that would answer both of these questions. I tried several things but was not happy with the results. Then on LinkedIn Brent Brewington recommended I try a “Swarm Plot”. I had never heard of this visualization before, but I think it is the right tool for the job. (If you like technical discussions like this then please connect with me on LinkedIn.)

In the swarm plot below each county is represented by a dot. Its position along the x-axis indicates how its population has changed, on a percentage basis, since Covid. The y-axis is meaningless: it’s just used to prevent the dots from overlapping. Because San Francisco appears so far to the left, you can see that it’s an outlier. In fact, of the 813 counties in this dataset, only two had a larger decrease.

Another Example

Taken together, this pair of graphs is powerful. They allow you to quickly see not just how a county changed, but also how that change compares to how every other county changed.

Consider the number of people who work from home. Between 2019 and 2023 San Francisco saw a 183% increase in the number of people who work from home. This is a big increase! But as the swarm plot shows, this change is similar to what happened in most other counties.

Technical Changes

The initial version of the Covid Demographics Explorer, which I released last summer, was the first project I completed in Python. One year later, I’ve grown a lot technically. So when I reviewed the code, I saw opportunities to improve it.

Continuous Integration (CI)

Last fall I took Matt Harrison’s Professional Python course. One week we did a deep dive into Python’s ecosystem around linting. We started by learning about tools like black, flake8 and ruff. Then we learned that it’s popular to run these tools on every Pull Request using GitHub Actions.

Longtime readers will know that shortly after taking Professional Python I began contributing to the censusdis package. That repo uses these tools in the same way that Matt taught them. So I decided to set up a CI workflow for the project’s repo to do the same thing.

If you’re interested in setting up something similar, a good place to start is my repo’s lint.yml file. First copy it to the same path in your own repo. Then enable GitHub Actions on the repo.

uv vs. requirements.txt

The initial version of this project used pip and requirements.txt to list its dependencies. This is still probably the most popular way to manage project dependencies in Python.

But in Professional Python we learned about uv, a newer and faster way to manage Python environments. uv has only grown in popularity since I took Professional Python – I continue to be amazed at how frequently I hear it talked about at meetups and on LinkedIn. So I decided to port this project to uv as well.

If you’d like help getting started with uv, check out the instructions I wrote in DEVELOPER.md. They explain how to install uv locally and recreate my virtual environment. The final step is running the app locally – if that works, then everything was set up correctly.

Directory Structure

This version increased the number of files in the project quite a bit. One way I addressed this is by putting all the files related to data generation in a separate subdirectory. This failed because the data generation script tried to import a function from a file one level up (in the home directory). I kept on getting errors like ImportError: attempted relative import with no known parent package.

This seemed like such a common thing to do, I assumed that I must be missing something obvious. But after reading this StackOverflow question (which has 2.1 million views!) I began to realize that this is a very common problem Python programmers have. Some of the comments (such as this one) express a lot of frustration at how hard Python makes this.

In case it helps someone with a similar problem, here is what I wrote in my PR that fixed the issue:

Remove dependencies data directory had on its parent directory.

It appears that the way Python works, life is easier if a script in a subdirectory (such as data/gen_county_data.py) does not import anything from a parent directory. Note that gen_county_data.py is a script (meaning it is meant to be run on the command line), and not a module (meaning it is not meant to be imported into another script).

In practice this meant moving the get_unique_census_labels function from /backend.py to /data/census_vars.py.

This required turning the data directory into a package (i.e. adding a __init__.py file to it) so that the function could be accessed by modules in the parent directory.

This allows me to continue to run the data generation script as “python gen_county_data.py” from the command line in the directory where it exists. It appears that the old design might have required me to run the script from a directory above and as a module (i.e. with “python -m”).

Migration to censusdis v1.4.0

Early in the development of this project I learned a hard lesson about ACS data: variables can change meaning over time. For example, Census currently uses variable B08006_017E to count the number of people who work from home. But in 2005 they used that variable to count the number of people who commute by motorcycle! (That year they used B08006_021E to count people who work from home).

In the initial version of the app I added a lot of code to my data generation script to catch errors where a variable unexpectedly changes meaning over time. A few months ago I contributed that code to the censusdis package, so that others can easily use it.

In this version of the Covid Demographics Explorer I refactored the data generation script to use the error checking code that’s now in censusdis. The result was pretty significant: in addition to deleting an entire module, the data generation script went from 130 to 69 lines. Even better, I think the script is now much easier to read (link).

Closing Thoughts

First of all, I hope that this app helps you better understand how your county has changed since Covid. I personally found that looking at Census data for San Francisco, and comparing it to what happened in other counties, gave me a new perspective on how Covid changed my city.

Second of all, this project is open source and released under a very permissive license. I hope that the app inspires at least one person to modify it to answer their own questions about how America has changed over time. If you wind up using it as an starting off point for another project, please drop me a line!

Finally, if you enjoy the app, please consider giving the repo a star on github! I just started a job search, and heard that having “popular” repos on github can help with that.

While comments on my blog are closed, I welcome hearing from readers. You can contact me here.

Ari Lamstein

Ari Lamstein

I’m a software engineer who focuses on data projects.

I most recently worked as a Staff Data Science Engineer at a marketing analytics consultancy. While there I developed internal tools for our data scientists, ran workshops on data science and mentored data scientists on software engineering.

Thanks for visiting!

Sign up to stay up to date with the latest blog posts: