Shopify: How to Build a Production Grade Workflow with SQL Modelling

shopify.engineering

Shopify just wrote about their work with dbt!

I’ll show you how we moved to a SQL modelling workflow by leveraging dbt (data build tool) and created tooling for testing and documentation on top of it.

It's been a real privilege to work with the team @ Shopify over the past months as they've really pushed dbt to its limits. Read how the org is rolling it out to transition some data processing out of PySpark and into BigQuery, and how they've extended its functionality when it comes to unit testing and CI. We're highly aligned with their thinking on this stuff and really appreciate their willingness to move the ball forwards independently. Hope to see some of this stuff live for the entire community at some point!

Michelle and Chris, if you're reading: I'd love to hear more about the incremental functionality you reference here:

dbt’s current incremental support doesn’t provide safe and consistent methods to handle late arriving data, key resolution, and rebuilds. For this reason, a handful of models (Type 2 dimensions or models in the 1.5B+ event territory) that required incremental semantics weren’t doable—for now. We’ve got big plans though!

This is an area that I have a long-standing interest in and I'd love to hear what you have cooking!

Read more...
Linkedin

Want to receive more content like this in your inbox?