Data Engineering Intern
Data Engineering Intern
At Picnic we care about health and safety, so currently all interviews are done online and our team is working from home. We keep a close eye on the developments around Covid-19, and support your relocation to Amsterdam when it is safe to do so. It is possible to start working remotely with us - have a peek at how the remote onboarding is going here.
Do you want to be part of the grocery revolution? At Picnic you get the chance to join a young and forward-thinking start-up culture, where everything you do matters. With a data-driven approach and an incredible service, Picnic strives to make grocery shopping simple, fun, and affordable for everyone. Known for our just-in-time supply-chain and a unique last-mile delivery strategy, we have become one of the fastest-growing companies of the Netherlands!
We rely on our Data Engineering team to glean insights from large data sets and promote business intelligence. Working with next generation technologies they write the future of in-app grocery ordering. We’re on a quest for brilliant interns to join our amazing team.
🏢 Where you fit in
Picnic is evolving. We’re a growing company constantly presented with new challenges. This creates a lot of room for inventive ideas. From perfecting our delivery routes, forecasting our growth or coming up with new strategies to reduce waste - making decisions in every area requires high quality data. With hundreds of data sources and thousands of Data Warehouse tables, we need to stay on top of any data quality issue. As the business complexity increases, it also becomes more challenging to trace insights back to the origin system and get a quick overview of the applied transformations. This is where you come in!
From day one, you take complete ownership over a project considered one of the most challenging practical topics in Data Engineering: Data Lineage. You will run the whole project from initiation to implementation. We count on you to come up with an innovative solution that will have a genuine impact on a rapidly growing company.
🔥 What challenges await you
- Work in a smart, young, and motivated team
- Learn directly from highly skilled Data Engineers, something no textbook can teach
- Evolve, develop, and challenge yourself in a young tech company
- Research and take ownership of complex computer science solution
- Improve your awareness, interactivity and the ability to ask the right questions
- Apply knowledge gained from university courses to real-world challenges
👉🏼 Who you are
- Pursuing a Master’s degree in Computer Science, Machine Learning, Physics or related study
- Affinity for data puzzles and insightful visualisations
- Mental Athleticism: Highly analytical and curious intellect
- Out-of-the-box Thinking and Initiative: Hands-on, nothing-is-impossible mindset
- Quick Learning: Intellectual horsepower to learn on the go
- At least 6-7 months for a graduation internship or 4 months for a regular internship.
👩💻 Technologies you'll use
- SQL on Snowflake
📖 Project details“Development of a data lineage framework for the Data Warehouse (DWH) and Extract Load Transform (ELT) processes”
The Data Warehouse at Picnic is a very large database holding TBs of data. It consists of two main areas:
1. Data Vault: extracted data from operational systems, nearly raw
2. Dimensional Kimball Model: aggregated and enriched data from the Data Vault
Utilizing various means, data flows from operational systems via ELT processes to the Data Vault, within the Data Warehouse itself, and finally to external systems such as Tableau, Machine Learning (ML) models and operational systems.
The goal of this project is to build a framework able to represent and act upon a graph of the flowing data. Multiple approaches need to be combined, including:
- Lineage implicit links. Some movements of data can be automatically recognised, for example by parsing and interpreting SQL queries, analysing JSON schemas, etc.
- Lineage explicit (declarative) links. At some edges of our systems, there is no automatic way to know where some data was before or where it's being passed to. In those cases, adding metadata to ETL jobs about their source and target systems can enrich the lineage graph.
This topic is a mixed approach between research and applied knowledge. You are expected to give a state of the art of modern techniques towards data lineage building and representation and come up with novel ways to solve issues in a domain that doesn't have a lot of academic literature. At the same time, you will actively contribute to our codebase by developing this framework.Deliverables
- Documentation about the state of the art, existing approaches, and chosen solutions.
- Functional and extensible framework, configured to work with our current systems.
- Data Lineage survey
- Academic literature
- Expired Microsoft patent
- Picnic Tech Blog: Picnic’s Lakeless Data Warehouse
- Picnic Tech Blog: Data Engineer’s Role in the Future of Groceries
🍩 Picnic perks
In our modern Amsterdam office, you will have the freedom to experiment and evolve your own projects as well as the chance to test them on real customers. You will be part of an international, all-star team of ground-breaking entrepreneurs, brilliant engineers, and data wizards that work with desire and dedication each day. But don’t worry, at Picnic groceries aren’t a chore – we play as hard as we work! Besides all this, you will get 600 euros per month for a full-time internship.
- CV screening
- Phone conversation
- Online test*
- On-site day*
* Depending on the role