One of my career goals is to get as fluent with Python as I am with R. Since I don’t use Python at work, my efforts so far have focused on online courses and weekly exercise services. Courses came first (you need to learn the fundamentals somehow, after all). Then came weekly exercises (that’s where fluency is obtained: deliberate practice over time). Now, I think, I’m ready to try something new: taking on an interesting side project.
Long-time readers will not be surprised by my first idea for a side project: Building an app for exploring US census data. This would be akin to some of my most popular open source projects in R, which were Census explorers built using Shiny and Choroplethr. To complete this project I will have to learn two new skills:
One way that Python differs from R is the size of the ecosystem. In R, if you want to convert your analytical code to a website you really only have one choice: Shiny. When learning Python I asked several people if they created data apps and if so, which framework they used. The most common responses were Streamlit, Dash and Flask. Interestingly, even though Posit is currently porting Shiny to Python, I did not encounter anyone who used it.
I decided to learn Streamlit in part because Snowflake recently acquired them, and Snowflake is another technology that’s on my list of “modern data technologies to learn”. It seems that Snowflake is building an ecosystem around Python, and so learning Streamlit now might make it easier to learn the rest of Snowflake’s ecosystem later.
At a recent Meetup I met a few people who work for Streamlit and asked them the best way to learn it. They pointed me to the 30 Days of Streamlit challenge. I completed the challenge, and found it to be helpful in learning the basics. You can see a toy app I created for one of their challenges (including the source code) here.
The next step was to learn the best way to work with Census data in Python. There are a number of census-related Python packages, but I did not get a sense that there was a clear “winner” in the ecosystem. Again, this is different than the R world, where pretty much everyone uses tidycensus. Luckily Kyle Walker (the creator of tidycensus) pointed me to the censusdis package.
Learning the basics of censusdis has been on my todo list for a while. But it was only yesterday when I made time to watch the 90 minute video of Darren Vengroff speaking about it at a conference. The talk is so good that it inspired me to write up this blog post today!
In short, it looks like I now have the building blocks to take on my first choroplethr-like side project in Python!
Since I began writing about my experience learning Python something surprising happened: a number of people (mostly experienced R programmers) have asked me the best way to learn Python. These people tend to be in a similar position as me: they like R, and don’t feel limited by it at all. But they see Python increasingly being used on data-related projects, and don’t want to miss out on opportunities just because they don’t know Python.
My advice is two-fold:
The “multi-stage journey” part of learning Python is what surprised me the most. Here are the various stages I see myself having passed through since I began learning Python:
I realize that this sounds like a lot of work. It’s certainly more work than the websites that offer to teach you Python in 5 minutes. But many of my readers are advanced R users who are working on serious data projects. They’ve spent years developing their skills, and expect learning a new tool to be serious investment of time as well. (And simply “kinda” knowing a tool isn’t very helpful to them.)
In the meantime, I’m happy to have learned enough Python and Pandas to begin work on a Census explorer in Python, and I look forward to sharing it here when I’m done!