Data Engineering Zoomcamp: Week 1 Recap
Welcome to the inaugural edition of the Data Engineer Camp newsletter!
Organised by DataTalks.Club, this free annual camp, curated by Alexey Grigorev, offers a certification and a vibrant community for data enthusiasts like yourself. As we embark on this collective journey, I’m thrilled to share the highlights and my code of our action-packed Week 1.
Local Development Environment:
Our camp initiation began with a crucial step — creating a robust local development environment. This environment serves as the backbone of our data engineering escapades, providing the perfect playground for our coding endeavours. Personally, I use MacOS and Visual Studio Code, armed with all necessary extensions for Python, Jupyter Notebook, Docker, Terraform, and PostgreSQL. I won’t delve into installations here, as the course conveniently covers everything you need.
Docker:
Ever wished for a seamless and portable environment? Docker comes to the rescue by encapsulating everything an application needs within a container — a kind of box that houses the OS, system-level libraries, Python and more.
We explored the world of containerisation, ensuring our PostgreSQL database and other dependencies are just a ‘docker-compose up’ away. For SQL practice, we utilised a locally set up Postgres database, before taking our skills to the cloud.
Python and Pandas:
With our environment in place, we swiftly moved on to importing data into PostgreSQL. Python and Pandas played the role of our trusty sidekicks, turning what could be a cumbersome task into a breeze.
SQL Puzzles Unraveled:
Our SQL skills faced a robust test as we navigated through some tricky questions. There’s nothing quite like the satisfaction of crafting the perfect SQL query to extract the precise information you need! Check out a couple of examples below:
Cloudy Beginnings:
Taking our journey to new heights, we ventured into the clouds by creating our very own Google Cloud Platform (GCP) account. The sky’s the limit, and we’re ready to soar into the vast possibilities the cloud offers.
Terraforming the Future:
In the spirit of infrastructure as code, we welcomed Terraform and the GCP CLI into our toolkit. Deploying and gracefully deleting GCP resources became an empowering practice of commands, showcasing the elegance of automated infrastructure management.
As we say farewell to Week 1, the excitement for Week 2 is growing. We’re gearing up to unravel the mysteries of workflow orchestration with the intriguing promise of Mage.AI.
You can find the code related to this project in my GitHub repository.
Stay tuned for more updates and tips. Until then, happy coding and may your data always flow seamlessly!