This week I embark on an exciting new project: taking Databricks’ official “Data Engineering with Databricks” course! At the start of this project I find myself wanting to connect with people who have already taken this course. I am particularly interested to know how much effort the course took and how well it prepared you for “real” data engineering tasks at work. If this sounds like you then please contact me so we can correspond via email or schedule a short online chat!
This project came about because last month my company started using Databricks for a project I work on. I need to use it for non-trivial tasks and found that “figuring it out as I go” was leading to a lot of frustration and inefficiency on my end. I proposed taking a course to accelerate my learning and help me learn best practices right from the start. My company graciously agreed.
The most difficult part of this project was deciding which course to take. Databricks offers a ton of courses. Their main “Introduction to Data Engineering” course appears to be this two-day course. Here is how they describe it:
Data professionals from all walks of life will benefit from this comprehensive introduction to the components of the Databricks Lakehouse Platform that directly support putting ETL pipelines into production. You will leverage SQL and Python to define and schedule pipelines that incrementally process new data from a variety of data sources to power analytic applications and dashboards in the Lakehouse. This course offers hands-on instruction in Databricks Data Science & Engineering Workspace, Databricks SQL, Delta Live Tables, Databricks Repos, Databricks Task Orchestration, and the Unity Catalog.
Day 1
- Introduction to Databricks Lakehouse Platform, Workspace, and Services
- Delta Lake
- Relational entities on Databricks
- ETL with Spark SQL
- Just enough Python for Spark SQL
- Incremental data processing with Structured Streaming and Auto Loader
Day 2
- Medallion architecture in the data lakehouse
- Delta Live Tables
- Task orchestration with Databricks Jobs
- Databricks SQL
- Managing Permissions in the lakehouse
- Productionizing dashboards and queries on Databricks SQL
Sounds good! But after speaking with someone at Databricks I learned that they have what appears to be an even better option: a subscription to their new “Blended Learning” service. Whereas the in-person version of the course takes two days, the Blended Learning version of it takes four weeks. Each week you have a 1-hour lesson with a teacher, are assigned a lab and have a 1-hour walkthrough of the lab with a TA. You also go through the entire course with a cohort, and have a private forum just for your cohort. Personally, I am looking to retain and apply as possible, and I think that spreading the course out over a month will help with that.
The best part? The subscription lasts for one year. So after taking this course I can take other Blended Learning courses in their catalog at no additional charge.
Again, if you’ve taken this course before (or switched to a Data Engineering role from another role), I’d like to speak to you about your experience. You can contact me here.