One of my career goals is to get as fluent with Python as I am with R. Since I don’t use Python at work, my efforts so far have focused on online courses and weekly exercise services. Courses came first (you need to learn the fundamentals somehow, after all). Then came weekly exercises (that’s where fluency is obtained: deliberate practice over time). Now, I think, I’m ready to try something new: taking on an interesting side project.
Long-time readers will not be surprised by my first idea for a side project: Building an app for exploring US census data. This would be akin to some of my most popular open source projects in R, which were Census explorers built using Shiny and Choroplethr. To complete this project I will have to learn two new skills:
- Building data apps in Python
- Working with census data in Python
Building Data Apps in Python
One way that Python differs from R is the size of the ecosystem. In R, if you want to convert your analytical code to a website you really only have one choice: Shiny. When learning Python I asked several people if they created data apps and if so, which framework they used. The most common responses were Streamlit, Dash and Flask. Interestingly, even though Posit is currently porting Shiny to Python, I did not encounter anyone who used it.
I decided to learn Streamlit in part because Snowflake recently acquired them, and Snowflake is another technology that’s on my list of “modern data technologies to learn”. It seems that Snowflake is building an ecosystem around Python, and so learning Streamlit now might make it easier to learn the rest of Snowflake’s ecosystem later.
At a recent Meetup I met a few people who work for Streamlit and asked them the best way to learn it. They pointed me to the 30 Days of Streamlit challenge. I completed the challenge, and found it to be helpful in learning the basics. You can see a toy app I created for one of their challenges (including the source code) here.
Working with Census Data in Python
The next step was to learn the best way to work with Census data in Python. There are a number of census-related Python packages, but I did not get a sense that there was a clear “winner” in the ecosystem. Again, this is different than the R world, where pretty much everyone uses tidycensus. Luckily Kyle Walker (the creator of tidycensus) pointed me to the censusdis package.
Learning the basics of censusdis has been on my todo list for a while. But it was only yesterday when I made time to watch the 90 minute video of Darren Vengroff speaking about it at a conference. The talk is so good that it inspired me to write up this blog post today!
In short, it looks like I now have the building blocks to take on my first choroplethr-like side project in Python!
Advice to Newbies
Since I began writing about my experience learning Python something surprising happened: a number of people (mostly experienced R programmers) have asked me the best way to learn Python. These people tend to be in a similar position as me: they like R, and don’t feel limited by it at all. But they see Python increasingly being used on data-related projects, and don’t want to miss out on opportunities just because they don’t know Python.
My advice is two-fold:
- Set the bar high! Why not aim to learn Python just as well as you know R? Being on the cutting edge of a field is more exciting than just having passing familiarity with the basics.
- Recognize that this is a multi-stage journey.
The “multi-stage journey” part of learning Python is what surprised me the most. Here are the various stages I see myself having passed through since I began learning Python:
- Learn the basics of Python. Unlike R, Python was not designed from the ground up to be used for data analysis. This means that before you learn data science in Python, you need to learn the basics of the language. The place I started with this, and which I’d recommend to anyone, is this course by Reuven Lerner.
- Obtain fluency with the basics of Python. After you complete whatever introductory course you take, you will naturally start forgetting what you learned. You want the opposite: to solidify you knowledge and continue to learn more. The best resource I found for that is Reuven Lerner’s Weekly Python Exercises (WPE). Each level gives you one exercise a week over the course of 16 weeks on a specific aspect of Python. I’m currently enrolled in WPE B1: Advanced Topics 1, although there are three cohorts at the beginner level.
- Learn the basics of Pandas. After getting comfortable with the basics of Python you should start learning Pandas. Pandas is the most popular Python library for manipulating and visualizing tabular data. Unfortunately, no course on this really clicked for me, so I cannot give a recommendation from personal experience. But I know that Matt Harrison just published Effective Pandas 2, which has gotten good reviews.
- Obtain fluency with the basics of Pandas. Regardless of how you learn the basics of Pandas, I think you will likely have a similar problem as when you finished your first course on basic Python: You will naturally forget what you just learned, when you want to be solidifying it and learning more. My solution to that has been a subscription to Bamboo Weekly, which is a weekly exercise service focused on Pandas. I’m still not where I want to be with this, but each week is bringing me closer to my goal level.
- Complete an interesting side project. As the maps say: “You are here” 🙂
I realize that this sounds like a lot of work. It’s certainly more work than the websites that offer to teach you Python in 5 minutes. But many of my readers are advanced R users who are working on serious data projects. They’ve spent years developing their skills, and expect learning a new tool to be serious investment of time as well. (And simply “kinda” knowing a tool isn’t very helpful to them.)
In the meantime, I’m happy to have learned enough Python and Pandas to begin work on a Census explorer in Python, and I look forward to sharing it here when I’m done!